Notes to Robert et al.: Model criticism informs model choice and model comparison

In their letter to PNAS and a comprehensive set of notes on arXiv [arXiv:0909.5673v2], Christian Robert, Kerrie Mengersen and Carla Chen (RMC) represent our approach to model criticism in situations when the likelihood cannot be computed as a way to …

Authors: Oliver Ratmann, Christophe Andrieu, Carsten Wiuf

Notes to R ober t et al.: Model criticism inf orms model choice and model comp arison Oliv er Ratmann ?, † , Christophe Andrieu ‡ , Carsten Wiuf ‡ and Sylvia Ric hardson ‡ ? Biology Department, Duk e Universit y , Box 90338 Durham, NC 27708, USA ; † Statistical and Applied Mathematical Sciences In- stitute, Research T riangle Park, NC 27709, USA ; ‡ Department of Mathematics, Univ ersity of Bristol, Bristol, United Kingdom ; ‡ Bioinformatics Research Cen ter, Universit y of Aarhus, Aarh us, Denmark ; ‡ Centre for Biostatistics, Imperial College London, Lon- don, United Kingdom ; Email: ? oliver.ratmann@duke.edu In their letter to PNAS and a comprehensiv e set of notes on arXiv [1, 2], Christian Rob ert, Kerrie Mengersen and Carla Chen (RMC) represent our approac h to mo del criticism in situations when the likelihoo d cannot b e computed as a wa y to “contrast several mo dels with each other”. In addition, guided b y an analysis of scalar error terms on simple examples, RMC argue that mo del assessment with Appro ximate Bay esian Computation under mo del uncertaint y (ABC µ ) is unduly c hallenging and question its Bay esian foundations. W e thank RMC for their in terest and their detailed commen ts on our w ork, whic h giv e us an opportunity to clarify the construction of ABC µ and to explain further the utilit y of ABC µ for the purpose of mo del criticism. Here, w e provide a com prehensiv e set of answers to RMC’s comme n ts, which go b eyond our short resp onse [3]. F or sak e of clarit y , we re-state RMC’s main p oints in italic b efore w e answ er eac h of them in turn. W e wish to emphasize that the use of m ultiple error terms ε 1: K is a necessary and in tegral part of ABC µ . In the first section in [4], we introduced ABC µ with the num b er of error terms set to K = 1 to k eep the presen tation simple. In retrospect, we hop e that this initial simplification did not lead to confusion (although in later sections and in our applications w e clearly use m ultiple error terms). In tro duction and notation. Appro ximate Bay esian Computation (ABC) exploits mo del sim ulations x of a data-generating pro cess M for sampling from approximate p osterior distributions of the mo del parameters θ [5]. T ypically , suc h predictions form the basis for mo del criticism [6], and we prop ose to use the data already generated b y Monte Carlo implementations of ABC for this purp ose to o [4]. In ABC µ , the dual use of the mo del predictions is reflected in an extension of the state space of the targeted random v ariables: whenever the simulated summaries S ( x ) =  S 1 ( x ) , . . . , S K ( x )  , x ∼ f ( · | θ M ) are sufficiently close to the observed summaries S ( x 0 ), w e retain not only θ but also the computed discrepancies. The rationale of ABC µ is that small discrepancies b etw een x and the observ ed data x 0 indicate fav orable θ , whereas if these discrepancies are alw ays large, the data-generating pro cess 1 (in short: mo del) M cannot describ e the observed data w ell. The full p otential of ABC µ is realized when we compute m ultiple discrepancies, eac h for one summary statistic S k , ρ k  S k ( x ) , S k ( x 0 )  . F rom first principles, we deriv ed in [4] the sampling densit y of the accepted pairs  θ ,  ρ k  S k ( x ) , S k ( x 0 )   1: K  , whic h w e denote by f ρ,τ ( θ , ε 1: K | x 0 , M ) ∝ ξ x 0 ,θ ( ε 1: K ) π θ ( θ | M ) π ε 1: K ( ε 1: K | M ) . (1) W e obtained a formula for the “augmented lik eliho o d” ξ x 0 ,θ ( ε 1: K ), whic h enables us to relate the posterior error densit y f ρ,τ ( ε 1: K | x 0 , M ) = Z f ρ,τ ( θ , ε 1: K | x 0 , M ) dθ to the prior predictive error density , a well-kno wn Bay esian quantit y that was systematically discussed in a seminal pap er [7] b y Bo x (when K = 1 and ρ  S ( x ) , S ( x 0 )  = x − x 0 ). W e hav e f ρ,τ ( ε 1: K | x 0 , M ) ∝ π ε 1: K ( ε 1: K | M ) × L ρ ( ε 1: K | M ) (2) where the prior predictive error densit y is given by L ρ ( ε 1: K | M ) = Z δ  ρ k  S k ( x ) , S k ( x 0 )  = ε k  1: K  π ( x | M ) dx, and π ( x | M ) = R f ( x | θ, M ) π θ ( θ | M ) dθ denotes the prior predictiv e (data) density . The shorthand δ notation represen ts a limit of functions as detailed in Section S1.1 of the PNAS Supplementary Material [4]. The densit y π ε 1: K is fully determined by the ABC kernel in the likelihoo d appro ximation, f ρ,τ ( θ | x 0 , M ) = Z f ρ,τ ( θ , ε 1: K | x 0 , M ) dε 1: K ∝ π θ ( θ | M ) Z π ε 1: K   ρ k  S k ( x ) , S k ( x 0 )   1: K    M  f ( x | θ, M ) dx, (3) and can b e interpreted as a prior densit y [8]. The relationship Eq. 2 enables us to asso ciate a statistical in terpretation to our p osterior errors and to relate them to other Bay esian quan tities. Standard Assumptions in ABC and ABC µ . W e assume that (A1) π ε 1: K factorizes into Q K k =1 π ε k , is cen tered at zero and only dep ends on a multi-dimensional scale parameter τ = ( τ 1 , . . . , τ K ). The main reason b ehind (A1) is that otherwise, the same asp ects of the data might b e used to adjust the ABC k ernel (or “prior” densit y) π ε 1: K as w ell as the magnitude of the errors ρ k  S k ( x ) , S k ( x 0 )  , and hence (p oten tially) more than once to inform our quantities of interest f ρ,τ ( θ | x 0 , M ) and f ρ,τ ( ε 1: K | x 0 , M ). F urthermore, ABC µ might suggest to falsely reject the hypothesis that a mo del is an adequate represen tation of the data if π ε 1: K is not centered at zero. Typical choices are π ε k ( ε k | M ) = 1 /τ k 1    ε k   ≤ τ k / 2  , π ε k ( ε k | M ) = (2 πτ 2 k ) − 1 / 2 exp  − 1 / 2 ε 2 /τ 2 k  or π ε k ( ε k | M ) = 1 /τ k exp  − 2   ε k   /τ k  . W e emphasize that in ABC and ABC µ , (A2) the scale parameter τ of the 2 prior π ε 1: K is in general chosen as small as p ossible. Otherwise, if all model sim ulations are “acceptable”, w e ha v e that f ρ,τ ( θ | x 0 , M ) ∝ π θ ( θ | M ) Z π ε 1: K   ρ k  S k ( x ) , S k ( x 0 )   1: K    M  f ( x | θ, M ) dx = π θ ( θ | M ) Z const × f ( x | θ, M ) dx = π θ ( θ | M ) . F urthermore, (A3) the comp ound function x → ρ  S ( x ) , S ( x 0 )  m ust b e sensitiv e to changes in θ . Otherwise, we obtain f ρ,τ ( θ | x 0 , M ) ∝ Z π ε  const  f ( x | θ, M ) dx π θ ( θ | M ) = π θ ( θ | M ) . The idea is to construct useful discrepancies whic h reflect changes in the simulated data as θ changes. (A4) As in ABC, we require that ρ k  S k ( x ) , S k ( x 0 )  = 0 if and only if S k ( x ) = S k ( x 0 ). In contrast to most implementations of ABC, these discrepancies should b e real-v alued rather than non-negative. F or example, in the case of scalar summaries, w e use ρ k  S k ( x ) , S k ( x 0 )  = S k ( x ) − S k ( x 0 ) instead of ρ k  S k ( x ) , S k ( x 0 )  =   S k ( x ) − S k ( x 0 )   [4]. W e seek to construct (A5) roughly symmetric predictive error densities L ρ ( ε 1: K | M ) with mode at zero under the null h yp othesis that the prior mo del is an adequate representation of the data. Otherwise, negative small errors ε k ≤ τ k ma y b e significantly more (or less) frequen t than p ositive small errors ε k ≤ τ k under the nu ll, and conditioning on error magnitude could result in a large negative (or p ositive) posterior mean error even if the prior mo del is correct. Finally , we assume (A6) that the cumulativ e densit y function P θ,x 0  ε 1 ∈ E 1 , . . . , ε K ∈ E K  = Z X 1 n ρ k  S k ( x ) , S k ( x 0 )  ∈ E k  1: K o f ( x | θ, M ) dx is either con tinuously differentiable when the observ ation space X is con tinuous, or a step function when X is finite. In this case, ξ x 0 ,θ ( ε 1: K ) can be re-written in terms of its elemen tary deriv ative. Next, in order to deriv e Eqns. 2-3, w e also assume that the data-generating pro cess M given by f ( · | θ , M ) is sufficiently regular to exchange the order of integration and limits; recall Section S1.1 of the PNAS Supplemen tary Material [4]. Construction of ABC µ 1. RMC p oint out that “the denomination [of ξ x 0 ,θ ( ε 1: K ) as a] likeliho o d is deb atable” [2] and that “the pr o duct ξ x 0 ,θ ( ε 1: K ) π ε 1: K ( ε 1: K ) is pr ob abilistic al ly inc oher ent” [1]. This c onclusion derives fr om at le ast two observa- tions: (i) “ ξ x 0 ,θ is strictly sp e aking not pr op ortional to a density in x 0 ” [2] and (ii) “ ξ x 0 ,θ ( ε 1: K ) π ε 1: K ( ε 1: K ) is not invariant under r ep ar ameterization” [2]. - In ABC, the observed data is reduced to a set of summary statistics and compared to sim ulated summaries with a p ositive, scalar-v alued discrepancy function ρ  S ( x ) , S ( x 0 )  . F or the purp ose of parameter inference, 3 w e only need to plug ε = ρ  S ( x ) , S ( x 0 )  in to the ABC kernel. In other w ords, the scalar, p ositive error ε is in ABC merely a laten t random v ariable, in tro duced to facilitate Ba yesian computation [9, 10]. In [4], w e derive the sampling distribution of the random v ariable ε , and recognize the utilit y of the related m ultiple error terms ε 1: K , each associated to one summary , for the purpose of model criticism. T o us, ε 1: K is a random v ariable of particular statistical interest and not any longer a latent v ariable introduced for computational reasons. Intuitiv ely , we shift the observed summaries by ε 1: K and propose to infer whether summaries of x 0 that are shifted aw ay from zero would o ccur at a higher frequency and hence b e more probable than the (unshifted) observ ed summaries. F ormally , we define and iden tify the probabilit y density ε 1: K → ξ θ,x 0 ( ε 1: K ) = lim h → 0 Z δ h  ρ k  S k ( x ) , S k ( x 0 )  − ε k  1: K  f ( x | θ, M ) dx, (4) where the δ h function is giv en in Section 1.1 of the PNAS Supplementary Material [4]. F or an y given x 0 and θ , ξ θ,x 0 ( ε 1: K ) is the infinitesimal frequency with whic h we observe the m ulti-dimensional error ε 1: K . As RMC remark insigh tfully , it can b e called a predictiv e error density that conditions on the observ ed data and the mo del parameter θ . In [4], we termed θ , ε 1: K → f ρ,τ ( x 0 | θ , ε 1: K ) = ξ θ,x 0 ( ε 1: K ) (5) an “augmented likelihoo d” simply to indicate that the state space was extended. Example 1 Supp ose we observe a single, one-dimensional data p oint x 0 , and let us b elieve it is Poisson distribute d with r ate θ (denote d by M 1 ). Consider the sc alar err or ε = x − x 0 . By c onstruction, we have ξ θ,x 0 ( ε ) = lim h → 0 Z δ h  x − x 0 = ε  P oisson( x ; θ ) dx = θ x 0 + ε e − θ ( x 0 + ε )! 1  x 0 + ε ∈ [0 , ∞ )  , and the right hand side e quals in ε a Poisson distribution shifte d by − x 0 and in x 0 a Poisson distribution shifte d by − ε . Thus, when interpr ete d as a function in x 0 , ξ θ,x 0 ( ε ) is also define d for ne gative values. RMC’s il luminating Poisson example serves to demonstr ate how ξ θ,x 0 ( ε ) differs fr om a “likeliho o d”. However, RMC go b eyond our c onstruction Eq. 4 and trunc ate x 0 → ξ θ,x 0 ( ε ) to p ositive values so as to r e-adjust ξ θ,x 0 ( ε ) to the likeliho o d f ( x 0 | θ , M 1 ) that is only define d for p ositive x 0 [1, 2]. T o b e cle ar, this r e-adjustment is not p art of ABC µ . Eq. 4 corresp onds to a non-parametric ev aluation of the sampling model in the context of mo del uncertaint y . W e adhere to the sampling mo del in that data is simulated under the likelihoo d, x ∼ f ( · | θ , M ), and probe the mo del predictions in sev eral directions at the same time. If ε 1: K = 0, we ha ve with (A4) that ξ θ,x 0 ( ε 1: K ) corresp onds to the probabilit y of the observed summaries under θ . F or error terms different from zero, w e quan tify the probabilit y of deviations from the observed summaries under the sampling mo del. Lab eling f ρ,τ ( x 0 | θ , ε 1: K ) Eq. 5 a “shifted lik eliho o d” seems therefore more appropriate. Because we only shift the observ ed summaries in Eq. 4 (with no further re-adjustments tow ards the original likelihoo d as in [1, 2]), the re-normalization required when considering Eq. 5 as a function in x 0 do es not dep end on ε 1: K , and f ρ,τ ( x 0 | θ , ε 1: K ) is prop ortional to a densit y in x 0 . 4 Next, let us recall that our error terms ε k corresp ond directly to the compound functions x → ρ k  S k ( x ) , S k ( x 0 )  . Therefore, in ABC µ , a transformation of ε k implies a change in ho w the data is being summarized. Typ- ically , such a c hange requires to modify the scale parameter τ of the prior densit y π ε 1: K when the scale of the discrepancies changes to o. Therefore, transformations of the pro duct ξ x 0 ,θ ( ε ) π ε ( ε ) must also c hange τ in π ε 1: K when the Jacobian is not constan t. In the ABC literature, it is well-kno wn that the approximate p osterior densit y f ρ,τ ( θ | x 0 , M ) depends on the c hoice of discrepancies and the stringency of τ [5, 11, 12]. Since ABC µ only uses the information pro vided in ABC to a fuller exten t, the joint p osterior density f ρ,τ ( θ , ε 1: K | x 0 , M ) is equally sensitive to changes in the comp ound functions x → ρ k  S k ( x ) , S k ( x 0 )  and the vector τ . In other words, the ABC and ABC µ target densities f ρ,τ ( θ | x 0 , M ) and f ρ,τ ( θ , ε 1: K | x 0 , M ) are not in v ariant under different appro ximation schemes. This lea ves ABC µ probabilistically sound, but warran ts particular caution and calls for sensitivit y analyses, perhaps to a larger extent than is common practice. Mo del assessmen t 2. Mo del assessment with ABC µ r e quir es that “the data is informative” and “is chal lenging” [1, 2]. In the lo c ation-family example [2] it is shown that the p osterior err or e quals the prior err or if the prior pr e dictive density is flat. - W e agree with RMC that ABC µ cannot criticize a mo del when the observ ed data x 0 reduces to a single, one-dimensional data point, as in their examples [2] on page 1-2. More generally , w e show ed that the p osterior error can be interpreted as a weigh ted prior predictive error, Eq. 2 [4]. If the prior predictive error is uninformative, then the p osterior error will simply reflect the weigh ting. Example 2 Supp ose a Gaussian likeliho o d mo del M 2 with unknown me an θ and fixe d varianc e 1 , and a Gaussian prior density π θ ( θ | M 2 ) = N ( θ ; θ ? , h 2 ) . We have π ( x | M 2 ) = Z N ( x ; θ , 1) N ( θ ; θ ? , h 2 ) dθ = N ( x ; θ ? , h 2 + 1) , and L ρ,τ ( ε | M 2 ) = N ( ε ; θ ? − x 0 , h 2 + 1) . We mimic a situation wher e π θ is uniform by letting h → ∞ , so that π ( x | M 2 ) and L ρ,τ ( ε | M 2 ) b e c ome impr op er. Supp ose further a Gaussian err or density π ε ( ε | M 2 ) = N ( ε ; 0 , τ 2 ) . Then, f ρ,τ ( ε | x 0 , M 2 ) = N ( ε ; 0 , τ 2 ) . Likewise, when mo dels have c omp ar able p ar ameter sp ac es, then the Bayes’ factor wil l b e inde cisive under non- informative priors π θ [13]. Consider the alternative Gaussian mo del M 0 2 define d by f ( x | θ, M 0 2 ) = N ( x ; θ, 3) , the same prior density π θ , and let us fo cus on the appr oximate Bayes’ factor B ρ,τ = f ρ,τ ( x 0 | M 0 2 ) f ρ,τ ( x 0 | M 2 ) =  Z f ρ,τ ( x 0 | θ , M 0 2 ) π θ ( θ | M 0 2 ) dθ  .  Z f ρ,τ ( x 0 | θ , M 2 ) π θ ( θ | M 2 ) dθ  5 to mimic the situation that we c annot r e adily evaluate the likeliho o d. We obtain B ρ,τ = r τ 2 + h 2 + 1 τ 2 + h 2 + 3 exp  x 2 0 ( τ 2 + h 2 + 1)( τ 2 + h 2 + 3)  , which tends r apid ly to one as h → ∞ . In this setting, b oth our p osterior err or and appr oximate Bayes’ factor give r e asonable answers for the purp ose of mo del criticism and mo del c omp arison r esp e ctively. Base d on one data p oint, we c annot r eje ct the curr ent mo del M 2 and likewise, the appr oximate Bayes’ factor for cho osing among M 2 and a c omp ar able mo del is inde cisive. Clearly , there is no guaran tee that ABC µ alwa ys unco vers existing mo del mismatch. But ho w difficult is it to uncov er existing discrepancies with ABC µ in practice? T ypically , x 0 con tains some structure and/or rep eated observ ations. Instead of using just one data p oin t in Example 2, let us imagine a data set of 100 samples and summarize this data with tw o statistics, leading to a t wo-dimensional p osterior error density . Example 3 Consider a data set x 0 of 100 indep endent samples that ar e Exp onential ly distribute d with r ate 1 /µ t = 0 . 2 . We b elieve that e ach sample is gener ate d fr om N ( · ; θ , 1) and c onsider a Gaussian prior density π θ ( θ | M 2 ) = N ( θ ; θ ? , h 2 ) . We summarize the data with the sample me an x 0 and the sample me dian median( x ) , use the discr ep ancies ρ  S k ( x ) , S k ( x 0 )  = S k ( x ) − S k ( x 0 ) and c onsider the prior density π ε 1: K ( ε 1: K | M 2 ) = Q k 1 /τ k exp  − 2 | ε k | /τ k  with τ k = 0 . 1 . T o il lustr ate that ABC µ may r eve al inappr opriate prior sp e cific ations, we set θ ? = 0 and h 2 = 0 . 1 . We applie d the Metr op olis-Hastings sampler pr op ose d by Marjor am et al. [14] and r e c or de d the c ompute d discr ep ancies to estimate our p osterior err or (mcmcABC µ se e p age 11). A mor e detaile d discussion of various algorithms to sample fr om the ABC µ tar get density Eq. 1 wil l app e ar elsewher e. Figur es 1A-B show that the mar ginal densities f ρ,τ ( ε k | x 0 , M 2 ) ar e far fr om zer o, suggesting that our str ong prior b eliefs ar e inade quate to explain the data. T o il lustr ate that ABC µ may identify a faulty sampling mo del, we set θ ? = 5 , h 2 = 100000 . A gain, we estimate d the ABC µ tar get density numeric al ly with mcmcABC µ . Even though π θ is essential ly flat, our mar ginal p osterior err ors do not c enter at zer o, se e Figur es 1C-D. In [4], we investigate d primarily the mar ginal p osterior densities f ρ,τ ( ε k | x 0 , M ) = R f ρ,τ ( ε 1: K | x 0 , M ) dε − k . Her e, we also show he at plots which r efle ct mor e c ompr ehensively the multi-dimensional char acter of our err or density f ρ,τ ( ε x , ε median | x 0 , M 2 ) in Figur e 2A-B. In tuitively , ABC µ will indicate model mismatch whenever all discrepancies are sim ultaneously not close to zero for any θ . T o escap e uniden tifiability , the crux in Example 3 is to use m ultiple error terms asso ciated to co-dep endent summaries that may rev eal mo del inconsistencies, see also [4] for a similar example. In real-w orld applications, (most) summary statistics are usually co-dependent, rendering ABC µ a p otentially 6 A 4.0 4.4 4.8 5.2 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ε ε ( ( MEAN ) ) posterior density chain 1 chain 2 chain 3 chain 4 B 2.6 3.0 3.4 3.8 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ε ε ( ( MEDIAN ) ) posterior density chain 1 chain 2 chain 3 chain 4 C −1 0 1 2 0.0 0.1 0.2 0.3 0.4 0.5 0.6 ε ε ( ( MEAN ) ) posterior density chain 1 chain 2 chain 3 chain 4 D −3 −2 −1 0 1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 ε ε ( ( MEDIAN ) ) posterior density chain 1 chain 2 chain 3 chain 4 Figure 1: Numerical reconstructions of the densities f ρ,τ ( ε x | x 0 , M 2 ), f ρ,τ ( ε median | x 0 , M 2 ) in Example 3, obtained with samples generated by mcmcABC µ . The (A-B) p osterior error under inappropriate prior sp ecifications π θ and the (C-D) p osterior error under essen tially flat prior sp ecifications π θ suggest mo del mismatc h. A 0 2 4 0 2 4 ε ε ( ( MEAN ) ) ε ε ( ( MEDIAN ) ) B −1.0 0.0 1.0 2.0 −2.0 −0.5 0.5 ε ε ( ( MEAN ) ) ε ε ( ( MEDIAN ) ) Figure 2: Heat plots of the density f ρ,τ ( ε x , ε median | x 0 , M 2 ) in Example 3, obtained with samples generated by mcmcABC µ , (A) under inappropriate prior specifications π θ and (B) under essentially flat prior sp ecifications π θ . v ery pow erful method to reveal mo del inconsistencies. Because model inconsistencies can only increase with the inclusion of new summary statistics, we are typically prepared to use a large set of summaries. Moreov er, it is not required that these co-dep endent summaries are sufficient for θ under the data-generating pro cess M , as we illustrate in [3], Figure 1. A more detailed discussion will app ear elsewhere; here we only note that these properties are app ealing b ecause in real-w orld applications of ABC and ABC µ , it is t ypically not kno wn whether any set of summaries is sufficien t for the parameters of a given model while it is relativ ely easy to come up with co-dep enden t summaries. Ho wev er, the exten t to whic h f ρ,τ ( ε 1: K | x 0 M ) merely reflects the w eighting π ε 1: K should b e c hec ked, b ecause the discrepancies might not retain enough information of the data to question a mo del (recall A3). The 7 p erhaps simplest (but not necessarily successful) approac h is to compare the posterior error densit y to the shap e of π ε 1: K . Reassuringly , in our real-world applications, f ρ,τ ( ε 1: K | x 0 M ) differs mark edly from π ε 1: K ; see e.g. Figure 3 in the PNAS pap er where the prior error densit y is indicated in dotted lines. Crucially , since ABC µ do es not reject a mo del when f ρ,τ ( ε 1: K | x 0 M ) is close to π ε 1: K (recall A1), no harm is done should the discrepancies not b e informativ e. The pow er of ABC µ in assessing go o dness-of-fit stems, firstly , from probing a model in m ultiple directions at the same time. W e hope that our simple examples illuminate the con tribution of mo del inconsistencies, as reflected in multiple error terms, to mo del assessment. Secondly , ABC µ makes p ossible to criticize a mo del whose likelihoo d cannot b e readily ev aluated, and does not incur any extra computational cost when compared to ABC (in con trast to related predictiv e approac hes discussed in point 4 below). Mo del criticism 3. ABC µ “is str ongly imp acte d by prior mo deling [and] fails to c ondition on the observe d data” [1]. - The ABC kernel, whic h can be interpreted as a prior densit y π ε 1: K [8], is at the heart of ABC (recall A1) and mo dulates the degree to whic h ABC and ABC µ condition on the observ ed data. In other w ords, the ABC and ABC µ target densities are sensitiv e to the choice of π ε 1: K and particularly its scale parameter τ . Indeed, ABC µ conditions on the observ ed data b y accepting θ in relation to the magnitudes of the K computed discrepancies tak en together. Accordingly , under (A2, A3), the p osterior error density f ρ,τ ( ε 1: K | x 0 , M ) up dates the prior predictiv e error density L ρ ( ε 1: K | M ); see Example 4 below. Based on the observ ation that small error b o osts the weigh t of the asso ciated v alue of θ that are simulated from π θ , we sa y that “ABC µ criticizes a fitted mo del”. This can b e illustrated with the lo cation family in Example 2. Example 4 Consider again the Gaussian likeliho o d mo del M 2 and a Gaussian prior density π θ as in Exam- ple 2. F or our il lustr ation purp oses, let us cho ose π θ br o ad but not flat: h 2 = 9 . In this c ase, f ρ,τ ( ε | x 0 , M 2 ) ∝ N ( ε ; θ ? − x 0 , 10) N ( ε ; 0 , τ 2 ) = N ( ε ; ˜ θ , ˜ σ 2 ) wher e ˜ θ =  τ 2  ( τ 2 + 10)  ×  θ ? − x 0  and ˜ σ 2 = 10 τ 2 / ( τ 2 + 10) ≤ 10 . The p osterior err or density “up- dates” the prior pr e dictive err or in that the varianc e of f ρ,τ ( ε | x 0 , M 2 ) is smal ler than the one of L ρ,τ ( ε | M 2 ) . F urthermor e, we observe that   ˜ θ   is smal ler than the absolute me an of L ρ,τ ( ε | M 2 ) , r efle cting the fact that f ρ,τ ( ε | x 0 , M 2 ) criticizes a fitte d mo del r ather than the prior mo del ( M 2 , π θ ) . F or the purp ose of mo del criticism, it is imp ortant to recognize that the dependency of our posterior error on π ε 1: K is a goo d thing to the extent to which the prior π θ is not an adequate model parameterization. The smaller τ can b e chosen, the more we are able to criticize a fitted mo del and the more we atten uate the influence of π θ in f ρ,τ ( ε 1: K | x 0 , M ). The latter p oin t can also b e illustrated with the lo cation family in 8 A −1.0 0.0 1.0 −1.0 0.0 1.0 ε ε ( ( MEAN ) ) ε ε ( ( MEDIAN ) ) B −1.0 0.0 1.0 −1.0 0.0 1.0 ε ε ( ( MEAN ) ) ε ε ( ( MEDIAN ) ) C −4 0 2 4 −4 0 2 4 ε ε ( ( MEAN ) ) ε ε ( ( MEDIAN ) ) Figure 3: Heat plots of our posterior error densit y f ρ,τ ( ε x , ε median | x 0 , M 3 ) in Example 5 for broad π θ (A) µ 0 = 1000, (B) α 0 = 2 and β 0 = τ 0 = 1000 and when τ is set to o large, (C) α 0 = 2 and β 0 = τ 0 = 1000 and τ k = 6 . 4. Example 2, recall Section S1.2 in the PNAS Supplementary Material. Let us illustrate the influence of π θ and π ε 1: K when the summaries are co-dep enden t but not sufficien t for θ under model M . Example 5 Consider again the data set x 0 of 100 indep endent samples that ar e Exp onential ly distribute d with r ate 0 . 2 , supp ose now that e ach sample is gener ate d ac c or ding to a Gaussian likeliho o d mo del with unknown me an µ ∈ R and σ 2 ≥ 0 (denote d by M 3 ) and summarize the data with the sample me an and me dian. We c onsider ρ k ( S k ( x ) , S k ( x 0 )) = S k ( x ) − S k ( x 0 ) , π ε 1: K ( ε 1: K | M 3 ) = Q K k =1 1 /τ k 1    ε k   ≤ τ k / 2  , and the prior density π θ ( θ | M 3 ) = π ( µ | M 3 ) π ( σ 2 | M 3 ) wher e π ( σ 2 | M 3 ) = IG ( σ 2 ; α 0 , β 0 ) π ( µ | M 3 ) ∝ 1    µ − µ 0   ≤ τ 0  . In [3], Figur e 1, we chose a slightly differ ent prior density π θ with hyp erp ar ameters µ 0 = 5 , τ 0 = 10 , α 0 = 4 and β 0 = 75 such that the prior me ans of µ and σ 2 ar e 5 and β 0 /  α 0 − 1  = 25 , matching the empiric al me an and the standar d deviation of the observe d data. T o il lustr ate that ABC µ unc overs existing mo del mismatch with c o-dep endent summaries that ar e not suffi- cient for θ under M 3 even when π θ differs marke d ly fr om the data or is uninformative, we now vary these hyp erp ar ameters. First, let us set τ 0 = 1000 . We r an mcmcABC µ (se e p age 11) for 10 , 000 iter ations to sam- ple fr om f ρ,τ ( µ, σ 2 , ε x , ε me dian | x 0 , M 3 ) with τ k set to 1 . 6 , and r ep e ate d this run four times fr om over disp erse d starting values to assess the c onver genc e of the chains. Samples fr om the burn-in p erio d wer e disc ar de d. Figur e 3A il lustr ates that our joint p osterior err or density r emains virtual ly unchange d (c omp ar e to Figur e 1C in [3]). Next, we set α 0 = 2 and β 0 = τ 0 = 1000 and r an mcmcABC µ as ab ove. Even though π θ is now extr emely br o ad, our joint p osterior err or density c ontinues to identify mo del mismatch; se e Figur e 3B. ABC dep ends on the err or thr eshold τ , and so do es ABC µ . In or der to identify mo del mismatch with f ρ,τ ( ε 1: K | x 0 , M ) , existing c onflicts among sever al summary statistics ar e unc over e d by setting τ sufficiently 9 smal l. F or example, setting τ k to 6 . 4 such that the ac c eptanc e pr ob ability of mcmcABC µ is lar ger than 80%, the p osterior err or is very br o ad and do es not suggest mo del mismatch; se e Figur e 3C. In summary , the ability of ABC µ to criticize a fitted model is strongly mo dulated b y the choice of discrepancies and the error threshold τ . Probing a mo del under particular assumptions on π θ in the directions sp ecified b y ε 1: K is not guaranteed to unco ver existing mo del mismatch. F or example, using ABC µ with the sample mean and the standard deviation (tw o indep endent summaries) in place of the sample mean and the m edian in Example 5 fails to uncov er existing mo del mismatch. Similarly , using the sample mean and the 25% quantile fail to reveal model inconsistencies as clearly as the sample mean and the sample median. In principle, the con tribution of “the data” to f ρ,τ ( ε 1: K | x 0 , M ) can only increase with larger K and/or more stringent choices of τ , and cannot be quan tified b y considering flat π ε 1: K (see also A2). In the directions determined b y ε 1: K and τ , the criticized model comprises the sampling model M and our prior assumptions π θ , and we ac knowledge that “having no w ay to distinguish betw een prior and sampling mo del inadequacy is a difficult y” [2]. More w ork is needed here. 4. F r om an ABC p ersp e ctive, using the p osterior pr e dictive m ( x | x 0 , M ) inste ad of the prior pr e dictive π ( x | M ) “r e quir es same c omputing times” [2]. - In the con text of ABC when the lik eliho o d cannot b e readily ev aluated, the use of the p osterior predictiv e (data) density m ( x | x 0 , M ) = Z f ( x | θ , M ) f ( θ | x 0 , M ) dx is complicated b y the fact that samples from the true p osterior densit y f ( θ | x 0 , M ) are in general not a v ailable. Ho wev er, m ( x | x 0 , M ) can b e appro ximated b y m ρ,τ ( x | x 0 , M ) = Z f ( x | θ , M ) f ρ,τ ( θ | x 0 , M ) dx. An alternativ e approac h for mo del criticism could be the appro ximate p osterior predictiv e (APP) error density L ρ,τ ,x 0 ( ε 1: K | M ) = Z δ n ρ k  S k ( x ) , S k ( x 0 )  = ε k  1: K o m ρ,τ ( x | x 0 , M ) dx. Ho wev er, for a complex mo del the extra volatilit y induced by simulating from f ( · | θ , M ) means that re- sim ulations from f ρ,τ ( θ | x 0 , M ) need not meet the stringency requirement π ε 1: K . It migh t therefore also b e useful to consider the w eighted approximate p osterior predictiv e (wAPP) error densit y f ρ,τ ,x 0 ( ε 1: K | x 0 , M ) ∝ L ρ,τ ,x 0 ( ε 1: K | M ) π ( ε 1: K | M ) . Both L ρ,τ ,x 0 ( ε 1: K | M ) and f ρ,τ ,x 0 ( ε 1: K | x 0 , M ) adopt a sequential approach to mo del criticism, comprising a training step (inference of f ρ,τ ( θ | x 0 , M )) and a testing step (APP or wAPP). The testing step adds a computational ov erhead to typical ABC pro cedures. F or example, it takes ab out tw o minutes to ev aluate our sev en summaries on the Sac char omyc es c er evisiae PPI data set [15], and hence an extra 2 × 500 / 60 ≥ 16hrs 10 to obtain 500 samples from APP on one computer. Assuming a large acceptance probabilit y of 10%, the extra time required to obtain 500 samples from wAPP is more than 6 da ys. By con trast, our posterior error incurs no additional computational cost b ecause the discrepancies already computed in an y ABC algorithm are only used to a fuller exten t. Nevertheless, one might be prepared to pay this cost if the densities L ρ,τ ,x 0 ( ε 1: K | M ) and f ρ,τ ,x 0 ( ε 1: K | x 0 , M ) would hav e an in trinsic adv antage compared to our p osterior error densit y f ρ,τ ( ε 1: K | x 0 , M ). In general, it is difficult to compare the behavior of our p osterior error with APP and wAPP under mo del uncertain ty . First, we note that L ρ,τ ,x 0 ( ε 1: K | M ) and our f ρ,τ ( ε 1: K | x 0 , M ) are very differen t quantities, relating resp ectively to sequen tial and simultaneous approaches to mo del criticism. This is also reflected in their distinct asymptotic prop erties as τ → 0. Second, Example 7 in the Appendix demonstrates that, coun ter-intuitiv ely , f ρ,τ ( θ | x 0 , M ) ma y be broader than π θ under mo del uncertain ty . An additional complication to be considered with APP and wAPP is that the same asp ects of the data are used to inform b oth the training and the testing phase. Hence, these quantities violate the fundamental requirement in statistical learning that the training data b e indep endent from the testing data [16]. It is possible to use differen t asp ects of the data during both stages, and this brings us bac k to the partially predictiv e and conditionally predictiv e densities previously discussed b y Bay arri and Berger [17]. Unfortunately , in real- w orld applications of ABC, it is often difficult to iden tify discrepancies that are indep endent of eac h other. 5. Our estimator ˆ ξ to the augmente d likeliho o d that is b ase d on B r ep e at samples “c annot b e use d as a pr actic al devic e b e c ause B is ne c essarily smal l” [1] “ . . . in which c ase the non-p ar ametric appr oximation is p o or, or B is lar ge in which c ase pr o ducing the x’s is to o time-c onsuming” [2]. - In our applications, we found that we obtained largest improv emen ts in terms of the effectiv e sampling size for small to moderate v alues of B that are computationally feasible. Let us also recall that the c hoice of prop osal kernel in ε 1: K is a crucial element of the second algorithm in [4] and should not b e omitted when considering its efficiency . W e ackno wledge that our observ ations may not readily extend to other applications. In the same wa y that we augmented standard ABC to what we call Std-ABC µ in [4], it is straightforw ard to mo dify an y existing ABC algorithm for the purp ose of mo del criticism by using (i) many co-dep endent, real- v alued discrepancies and (ii) recording those discrepancies. F or example, the Metrop olis-Hastings sampler prop osed by Marjoram et al. [14] can b e adapted to pro vide samples from the target distribution f ρ,τ ( dθ , dx, dε 1: K | x 0 , M ) = π θ ( θ | M ) π ε 1: K ( ε 1: K | M ) f ( x | θ , M ) f ρ,τ ( x 0 | M )  δ ρ 1: K ( x ) ( dε 1: K ) dx dθ  , where we put ρ 1: K ( x ) =  ρ k  S k ( x ) , S k ( x 0 )  1: K for brevity . Here, δ ρ 1: K ( x ) ( dε 1: K ) denotes the Dirac measure at the p oin t ρ 1: K ( x ). Supp ose initial v alues θ 0 , x 0 ∼ f ( · | θ 0 , M ) and set ε 0 k = ρ k  S k ( x 0 ) , S k ( x 0 )  . mcmcABC µ 1 If now at θ prop ose a mov e to θ 0 according to a prop osal density q ( θ → θ 0 ). mcmcABC µ 2 Generate x 0 ∼ f ( · | θ 0 , M ) and compute ε 0 k = ρ k  S k ( x 0 ) , S k ( x 0 )  for k = 1 , . . . , K . mcmcABC µ 3 Accept ( θ 0 , x 0 , ε 0 1: K ) with probabilit y mh ( θ , x, ε 1: K ; θ 0 , x 0 , ε 0 1: K ) = min  1 , r v anilla ( θ , x, ε 1: K ; θ 0 , x 0 , ε 0 1: K )  11 where r v anilla ( θ , x, ε 1: K ; θ 0 , x 0 , ε 0 1: K ) = π θ ( θ 0 | M ) q ( θ 0 → θ ) π ε 1: K ( ε 0 1: K | M ) π θ ( θ | M ) q ( θ → θ 0 ) π ε 1: K ( ε 1: K | M ) , and otherwise sta y at ( θ , x, ε 1: K ). Then return to mcmcABC µ 1. As is standard practice, the algorithm is run sufficiently long after a certain “burn-in” p erio d, and samples from the burn-in p erio d are discarded. It is not difficult to show that marginally in ( θ , ε 1: K ), mcmcABC µ pro vides samples from f ρ,τ ( θ , ε 1: K | x 0 , M ) Eq. 1 for suitable prop osal kernels q ( θ → θ 0 ) under our regularity assumptions (A6). Mo del criticism and mo del comparison 6. “The Bayesian foundations of ABC µ ar e questionable: the c onse quenc es of r eje cting a mo del ar e ignor e d by ABC µ but include c onstructing another mo del” [1] and “this le ads to wonder ab out the gain c omp ar e d with using the Bayes factor” [2]. Mor e over, “the estimation of Bayes’ factors is even faster” [1] and “pr ovides a differ ent answer” [2]. - T o us, mo del criticism and mo del comparison are imp ortant and complemen tary asp ects of statistical reasoning. Indeed, metho ds for mo del comparison attempt to choose betw een candidate mo dels, even if all of them do not matc h the data in one or several asp ects w ell. ABC is very flexible in that arbitrary data-generating pro cesses M can b e analyzed without the need to compute the likelihoo d, so long as the ev aluation of the summary statistics is computationally tractable. ABC µ mak es p ossible to ev aluate at no extra computational cost whether a mo del matches the observ ed data in terms of a large set of summary statistics, and to obtain useful indications how a faulty mo del should b e mo dified. In our w ork, we found that ABC µ th us enables to iterate rapidly through the initial stages of model design to identify one or a suite of mo dels which are in agreement with the data, even when the lik eliho o d cannot b e readily ev aluated. W e b elieve that the ability of ABC µ to offer statistical rigor at this p oin t is highly v aluable to areas of mo dern science where complex mo dels are now formulated to explain and agree with data collected across the traditional b oundaries of disciplines. F or example, in biology , w e face a w ealth of data that is hard to analyze in its entiret y under current computer resources (e.g. molecular genetic data), or we hav e one intricate data set (e.g. molecular interaction netw orks), or we cross b oundaries of biological organization (e.g. systems biology). The metho ds presen ted in Ratmann et al. [4] do not address the problem of c ho osing a mo del from a suite of candidates. Mo del c omparison when the likelihoo d cannot b e readily ev aluated is not the topic of [4], and has b een in tro duced elsewhere [18, 19, 20, 21, 22]. Example 6 L et us r e-visit RMC’s Poisson example [1, 2] in or der to (a) il lustr ate mo del criticism with ABC µ when the err ors ar e discr ete r ather than c ontinuous r andom variables, (b) insp e ct the c ase of asymmetric 12 A −5 0 5 10 0.0 0.1 0.2 0.3 0.4 ε ε f ρ ρ , τ τ ( ( ε ε | x 0 , , M ) ) x 0 = = 5 x 0 = = 1 τ τ = = 2 B −5 0 5 10 0.0 0.1 0.2 0.3 0.4 0.5 ε ε f ρ ρ , τ τ ( ( ε ε | x 0 , , M ) ) τ τ = = 50 τ τ = = 2 3 x 0 = = 5 Figure 4: Densit y plots of f ρ,τ ( ε | M 1 ) in Example 6. Posterior mean errors are indicated in large diamonds. (A) W e fix τ = 2 and consider a data p oint x 0 = 1 that is in agreemen t with our prior b elief π θ = Exp(1) as well as a data p oint x 0 = 5 that differs from our prior mo del. In the latter case, the p osterior mean error suggests mismatch b et ween the model and the data. (B) W e fix x 0 = 5 and consider the prior predictive error density (corresp onding to a n essentially flat π ε with τ = 50) and the p osterior error density asso ciated to τ = 2 / 3. Again, w e observe that this fitted mo del is harder to criticize than the prior mo del. pr e dictive err or densities L ρ ( ε 1: K | M ) and (c) r e-examine the b ehavior of the appr oximate mar ginal likeliho o d as pr esente d in [2], Figur e 1. Consider the Poisson mo del M 1 of Example 1 and supp ose that π θ ( θ | M 1 ) = exp( − θ ) 1  θ ≥ 0  . We have that π ( x | M 1 ) = Z ∞ 0 Poisson ( x ; θ ) exp( − θ ) dθ = 2 − x − 1 1 { x ≥ 0 } , and henc e L ρ ( ε | M 2 ) ∝ 2 − x 0 − ε − 1 1 { x 0 + ε ≥ 0 } . The ABC kernel always dep ends on an “err or thr eshold” τ (r e c al l A1) and, given the form of L ρ ( ε | M 2 ) , we c onsider her e π ε : { 0 , 1 , − 1 , 2 , . . . } → R + 0 with π ε ( ε | M 1 ) ∝ 2 −| ε | /τ . Then, our mar ginal p osterior err or is f ρ,τ ( ε | M 1 ) ∝ 2 − ( x 0 + ε + | ε | /τ +1) 1 { x 0 + ε ≥ 0 } , with normalizing c onstant f ρ,τ ( x 0 | M 1 ) = ∞ X ε = − x 0 2 − ( x 0 + ε + | ε | /τ +1) = 2 − ( x 0 +1)  1 { x 0 > 0 }  1 − 2 (1 − 1 /τ )( x 0 +1) 1 − 2 1 − 1 /τ − 1  + 1 1 − 2 − 1 − 1 /τ  (6) 13 A 0 2 4 6 8 10 −8 −6 −4 −2 0 x 0 posterior mean error ● ● ● ● ● ● ● ● ● ● ● τ τ = = 2 3 τ τ = = 2 τ τ = = ∞ ∞ B 0 2 4 6 8 10 0.001 0.005 0.050 0.500 x 0 approximate marginal likelihood ● ● ● ● ● ● ● ● ● ● ● τ τ = = 0 τ τ = = 2 3 τ τ = = 2 Figure 5: (A) Plots of the p osterior mean error R εf ρ,τ ( ε | x 0 , M 1 ) in Example 6 as a function of x 0 for τ = 2 / 3 (red), τ = 2 (black) and τ = ∞ (grey). Pro vided the prior model is an adequate representation of the data ( x 0 = 1), the prior predictiv e mean error is zero (dashed lines). By con trast, the p osterior mean error is not zero in this case (red and black lines), simply b ecause L ρ ( ε | M 1 ) is not symmetric. (B) Plots of the ABC µ approximate marginal lik eliho o d as a function of x 0 for τ = 2 / 3 (red), τ = 2 (black) and τ = ∞ (grey). Note that this plot differs qualitativ ely from the one in [2] b ecause RMC decided to truncate ξ θ,x 0 to p ositive v alues. under the assumption that x 0 ≥ 0 . Figur e 4 il lustr ates the p osterior err or density for various choic es of x 0 and τ , and the r esp e ctive p osterior me ans ar e indic ate d in lar ge diamonds. Note that π ε was her e only chosen for r e asons of analytic al tr actability, and we c ould stil l use our two-side d Exp onential density π ε : R → R + 0 wher e π ε ( ε | M ) = 1 /τ exp  − 2 | ε | /τ  , or the standar d indic ator function. Inde e d, even if we do not know the set of p ossible discr ete err ors under a mo del M , al l we miss is the c orr e ct normalizing c onstant of π ε : { 0 , 1 , − 1 , 2 , . . . } → R + 0 wher e π ε ( ε | M ) ∝ exp  − 2 | ε | /τ  . This c onstant ne e d not b e known, se e for example our algorithm mcmcABC µ . Figur e 5A il lustr ates the p osterior me an err or R εf ρ,τ ( ε | M 1 ) ε as a function of x 0 for various choic es of τ . Setting τ = ∞ , we obtain the prior pr e dictive me an err or, which is zer o if the mo del c orr esp onds wel l to the observe d data ( x 0 = 1 ). Sinc e L ρ ( ε | M 2 ) is not symmetric ar ound zer o when the prior mo del is ade quate (opp osing A5), c onditioning on err or magnitude r esults in a slightly ne gative p osterior me an err or when x 0 = 1 . L et us r e c al l that RMC de cide d to trunc ate the density ε → ξ θ,x 0 ( ε ) to non-ne gative values, and then plotte d the asso ciate d mar ginal likeliho o d f trunc ( x 0 | M 1 ) = R R ξ trunc θ,x 0 ( ε ) π θ ( θ | M 1 ) π ε ( ε | M 1 ) dθ dε as a function of x 0 in 14 [2], Figur e 1. In Figur e 5B, we plot the ABC µ mar ginal likeliho o d Eq. 6 as a function of x 0 . In this example, the appr oximate mar ginal likeliho o d de cr e ases monotonic al ly for al l values of τ , and only smal l values of τ pr ovide a suitable appr oximation of the true mar ginal likeliho o d ( τ = 0 ). Thus, the p osterior me an err or and the appr oximate mar ginal likeliho o d b oth dep end on the pr e cise value of the “err or thr eshold”, suggesting that sensitivity analyses ar e r e quir e d for ABC µ as wel l as for c omplementary to ols for mo del c omp arison that ar e b ase d on appr oximate mar ginal likeliho o ds. The Bay es’ factor is a to ol that may address both mo del comparison and model criticism, dep ending on the form ulation of the null and alternative hypothesis. It is p ossible to devise approximate Bay es’ factors to test the null h yp othesis ε = 0 versus the alternativ e ε 6 = 0 as a surrogate measure for the h yp othesis that the mo del describ es the data adequately well, i.e. for the purp ose of mo del criticism (unpublished results, but see [23, 24]). How ever, in both cases, the robustness of the Bay es’ factor (with regard to the choice of τ and the quality of the n umerical appro ximation of the ABC or ABC µ target densities) is debatable (unpublished results). While we agree that computing the Bay es’ factor prop osed in [20, 19] is faster than computing p osterior predictive chec ks, w e also note that it cannot b e faster than sampling from f ρ,τ ( θ , ε 1: K | x 0 , M ) so long as the same ABC k ernel (i.e. π ε 1: K ) is used. 7. “Comp aring mo dels via the p osterior err or is missing the mo del c omplexity p enalisation fr om Bayesian mo del c omp arison” [1]. - W e ac knowledge that mo del complexit y is an imp ortan t quan tity to consider during model comparison. 8. “The choic e of ε and π ε ( ε ) is mo del dep endent and the c omp arison [of mo dels] r efle cts prior mo deling, not data assessment” [1]. Final ly, “using the same τ acr oss al l mo dels do es not se em to b e r e c ommendable on a gener al b asis” [1]. - W e agree that the c hoice of discrepancies (hence errors) and τ are application- and model sp ecific. Although the same summaries can t ypically b e used across models that attempt to explain the same data, mo del predictions will t ypically v ary and hence the scales of the sim ulated summaries. This implies that the same τ may not alwa ys be used across different mo dels. In this case, it ma y b e difficult to compare the p osterior error density f ρ,τ ( ε | x 0 , M ) across different mo dels. In [4], we only suggest to use f ρ,τ ( ε | x 0 , M ) to compare eac h model against the observ ed data. Conclusion W e still find that ABC µ enables us to comprehensively quantify discrepancies b etw een a data-generating process M and the data, sim ultaneously with parameter inference ev en when the likelihoo d cannot b e readily ev aluated, th us pro viding v aluable guidance on the in terpretability of parameter estimates and on ho w to improv e mo dels. Ho wev er, the metho d has its limitations. The p osterior error reflects an in terplay b etw een the prior predictive 15 error and the stringency with whic h that error is up dated; recall Eq. 2. A t presen t, there is no formal pro cedure to disentangle the contribution of π ε 1: K and L ρ ( ε 1: K | M ) in f ρ,τ ( ε 1: K | x 0 , M ); this prompted us to caution that it is difficult to convincingly asso ciate a formal, probabilistic framework with credibilit y in terv als of f ρ,τ ( ε k | x 0 , M ) [4]. In other w ords, there is no formal guarantee that zero is included in a 95% credibility in terv al with a probability of 0.95 under the hypothesis that the prior model is correct. W e agree that the methods prop osed b y V erdinelli and W asserman [24] are promising for the purpose of mo del criticism via Bay es’ factors, although the sharp hull h yp othesis ε 1: K = 0 has limitations in itself [25]. Moreo ver, w e emphasize that f ρ,τ ( ε 1: K | x 0 , M ) cannot b e though t of as a purely Ba yesian quan tity because π ε 1: K also determines the appro ximation qualit y of f ρ,τ ( θ | x 0 , M ), recall Eq. 3. In particular, this implies that the contribution of “the data” to f ρ,τ ( ε 1: K | x 0 , M ) cannot b e directly quan tified b y setting π ε 1: K uniform. Finally , we agree with RMC that the ABC and ABC µ target densities, i.e. f ρ,τ ( θ | x 0 , M ) and f ρ,τ ( θ , ε 1: K | x 0 , M ), are sensitiv e to c hanges in the comp ound functions x → ρ k  S k ( x ) , S k ( x 0 )  (i.e. not in v arian t) and ma y attain differen t meanings under different choices of τ . This lea ves the whole method probabilistically coheren t, but calls for sensitivity analyses. Nonetheless, w e believe that ABC and ABC µ are useful to compare observed data and mo del simulations in a coheren t wa y and to mak e inference on the model parameters as w ell as the error terms. It is difficult to understand p osterior quantities of f ρ,τ ( θ | x 0 , M ) in place of p osterior quan tities of the true p osterior densit y f ( θ | x 0 , M ), but it makes go o d sense to interpret them as quan tities that lie b etw een the prior and p osterior density . With our inabilit y to ev aluate the likelihoo d f ( x 0 | θ , M ), we ackno wledge that it is to o cumbersome (or would take to o long) to comprehend the data in full, and turn to those aspects S k of the data whic h we consider to b e most relev ant. Doing so, we retain some information of the a v ailable data and our p osterior densit y f ρ,τ ( θ | x 0 , M ) up dates our prior b eliefs accordingly . Simultaneously , we can make use of the very same information to inv estigate the adequacy of a mo del M in explaining the data, and to up date our prior predictions according to error magnitude. In conclusion, if w e interpret f ρ,τ ( θ | x 0 , M ) as an up date of our prior b eliefs in θ and f ρ,τ ( ε 1: K | x 0 , M ) as an up date of our prior predictiv e densit y under M , then b oth are useful and meaningful quan tities, particularly when the lik eliho o d f ( θ | x 0 , M ) cannot b e ev aluated. Ac knowledgemen ts W e thank Julien Cornebise for insightful comments on an earlier version of this man uscript, as well as Christian Rob ert for insightful discussions that stimulated parts of these notes. OR gratefully ac knowl- edges supp ort from NSF grant NSF-EF-08-27416, CA is supp orted b y an EPSRC Adv ance Research F ellowship, CW b y the Danish Research Council, and S.R. by the Cen tre for Integrativ e Systems Biology at Imperial College as well as gran t G-0600-609 from the Medical Researc h Council. App endix Example 7 R e c onsider the data set x 0 of n = 100 indep endent samples that ar e Exp onential ly distribute d with r ate 1 /µ t = 0 . 2 . We b elieve again that e ach sample of x 0 is gener ate d fr om N ( µ, σ 2 ) with µ ∈ R , σ 2 ≥ 0 unknown, and take a c onjugate normal inverse-gamma prior density for µ and σ 2 with hyp erp ar ameters µ 0 ∈ R , n 0 = 1 , 16 α 0 > 0 and β 0 > 0 π ( σ 2 | M 3 ) = f IG ( α 0 ,β 0 ) ( σ 2 ) = β α 0 0  σ − 2  α 0 +1 exp  − β 0 σ − 2   Γ( α 0 ) π ( µ | σ 2 , M 3 ) = f N ( µ 0 ,σ 2 /n 0 ) ( µ ) π ε 1: K ( ε 1: K | M 3 ) = K Y k =1 1 /τ k 1    ε k   ≤ τ k / 2  , with hyp erp ar ameters set to µ 0 = 5 , n 0 = 1 , α 0 = 4 and β 0 = 75 . We r an Std-ABC µ b ase d on the summary SYMM ( x ) = x − me dian ( x ) , ρ  S ( x ) , S ( x 0 )  = SYMM ( x ) − SYMM ( x 0 ) and the ab ove c onjugate prior in θ for 20 , 000 iter ations to obtain samples fr om the joint p osterior density f ρ,τ ( µ, σ 2 , ε SYMM | x 0 , M 3 ) for various values of τ . Inter estingly, the appr oximate p osterior density f ρ,τ ( θ | x 0 , M ) br o adens for de cr e asing values of τ , as shown in Figur es 6(A-B). A −20 0 20 40 0.00 0.01 0.02 0.03 0.04 0.05 0.06 µ µ ● ● ● ● τ τ 2 = = 0.8 τ τ 2 = = 0.9 τ τ 2 = = 1 τ τ 2 = = 1.2 B 0 50 100 150 200 0.00 0.01 0.02 0.03 0.04 σ σ 2 ● ● ● ● τ τ 2 = = 0.8 τ τ 2 = = 0.9 τ τ 2 = = 1 τ τ 2 = = 1.2 Figure 6: Numerical estimates of (A) f ρ,τ ( µ | x 0 , M 3 ) and (B) f ρ,τ ( σ 2 | x 0 , M 3 ) in Example 7 for decreasing v alues of τ SYMM (differen t colors). The respective marginal prior densities are ov erlaid (blac k, dashed). 17 Bibliograph y [1] Rob ert CP , Mengersen KL, Chen C (2009) Letter: Mo del choice versus mo del criticism. Proc Natl Acad Sci USA . [2] Rob ert CP , Mengersen KL, Chen C (2009) Mo del choice versus mo del criticism [3] Ratmann O, Andrieu C, Wiuf C, Ric hardson S (2009) Reply to Rob ert et al.: Mo del criticism informs mo del c hoice and mo del comparison. Pro c Natl Acad Sci USA in press. [4] Ratmann O, Andrieu C, Wiuf C, Ric hardson S (2009) Mo del criticism based on lik eliho o d-free inference, with an application to protein netw ork evolution. Pro c Natl Acad Sci USA 106:10576–10581. [5] Beaumont M, Zhang W, Balding D (2002) Approximate Bay esian Computation in p opulation genetics. Genetics 162:2025–2035. [6] Jeffreys H (1961) Theory of Probability. Oxford Universit y Press, 3rd edition. [7] Box GEP (1980) Sampling and Bay es’ inference in scien tific mo delling and robustness. J Roy Stat So c A (General) 143:383–430. [8] Wilkinson RD (2008) Approximate Bay esian Computation (ABC) giv es exact results under the assumption of model error [9] Andrieu C (2006) The expected auxiliary v ariable metho d for Monte Carlo simulation. URL http://www.newton. cam.ac.uk/webseminars/pg+ws/2006/scb/scbw01/1101/andrieu/ . Isaac Newton Institute for Mathematical Sciences, Cam bridge, UK. [10] Andrieu C, Rob erts G (2009) The pseudo-marginal approach for efficient Mon te Carlo computations. Ann Stat 37:697– 725. [11] Pritchard J, Seielstad M, Perez-Lezaun A, F eldman M (1999) Population growth of human Y chromosomes: a study of Y c hromosome microsatellites. Mol Biol Evol 16:1791–1798. [12] Joyce P , Marjoram P (2008) Approximately sufficien t statistics and Bay esian computation. Stat Appl Gen Mol Biol 7:26. [13] Berger JO, P ericchi LR (2001) Ob jective ba yesian metho ds for mo del selection: Introduction and comparison. IMS Lecture Notes-Monograph Series 38. [14] Marjoram P , Molitor J, Plagnol V, T av ar´ e S (2003) Mark ov Chain Mon te Carlo without lik eliho o ds. Pro c Natl Acad Sci USA 100:15324–15328. [15] Reguly T, Breitkreutz A, Boucher L, Breitkreutz B, Hon G, et al. (2006) Comprehensive curation and analysis of global in teraction netw orks in Saccharom yces cerevisiae. J Biol 5:11. [16] Hastie T, Tibshirani R, F riedman J (2001) The elements of statistical learning. Springer-V erlag, New Y ork. 18 [17] Bay arri MJ, Berger JO (1999) Bay esian Statistics 6, Oxford Universit y Press, chapter Quantifying surprise in the data and mo del verification (with discussion). pp. 53–82. [18] Wilkinson RD (2007) Bay esian inference of primate divergence times. Ph.D. thesis, Universit y of Cambridge. [19] Beaumont M (2008) Simulations, genetics and human prehistory , McDonald Institute Monographs, Universit y of Cam- bridge. [20] F agundes NJR, Ray N, Beaumont M, Neuensch w ander S, Salzano FM, et al. (2007) Statistical ev aluation of alternative mo dels of human evolution. Pro c Natl Acad Sci USA 104:17614–17619. [21] T oni T, Stumpf MPH (2009) Sim ulation-based model selection for dynamical systems in systems and population biology . Bioinformatics :btp619–. [22] Grelaud A, Rob ert CP , Marin JM, Ro dolphe F, T aly JF (2009) ABC likelihoo d-free metho ds for mo del choice in Gibbs random fields. Bay esian Analysis 4:317–336. [23] Dick ey JM, Lientz BP (1970) The w eighted likelihoo d ratio, sharp h yp otheses ab out chances, the order of a Mark o v c hain. Ann Math Stat 41:214–226. [24] V erdinelli I, W asserman L (1995) Computing Ba yes’ factors using a generalization of the Sa v age-Dic key density ratio. J Am Stat Ass 90:614–618. [25] Berger JO, Delampady M (1987) T esting precise hypotheses. Statistical Science 2:317–335. 19

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment