Stratified distance space improves the efficiency of sequential samplers for approximate Bayesian computation

Stratiﬁed distance space impr oves the ef ﬁciency of sequential samplers for appr oximate Bayesian computation Henri Pesonen 1,* and Jukka Corander 2,3,4 1 Oslo Center for Biostatistics and Epidemiology , Oslo University Hospital, Oslo, Norway 2 Department of Biostatistics, University of Oslo, Oslo, Norway 3 Parasites and Microbes, W ellcome Sanger Institute, Hinxton, Cambridgeshire, UK 4 Helsinki Institute for Information T echnology HIIT , Department of Mathematics and Statistics, University of Helsinki Helsinki, Finland * Corresponding author: henri.e.pesonen@medisin.uio.no Abstract Approximate Bayesian computation (ABC) methods are standard tools for inferring parameters of com- plex models when the likelihood function is analytically intractable. A popular approach to improving the poor acceptance rate of the basic rejection sampling ABC algorithm is to use sequential Monte Carlo (ABC SMC) to pr oduce a sequence of proposal distributions adapting towards the posterior , instead of generating values from the prior distribution of the model parameters. Proposal distribution for the subsequent itera- tion is typically obtained from a weighted set of samples, often called particles, of the current iteration of this sequence. Current methods for constructing these proposal distributions treat all the particles equivalently , regar dless of the corresponding value generated by the sampler , which may lead to inefﬁciency when propa- gating the information across iterations of the algorithm. T o improve sampler ef ﬁciency , a modiﬁed approach called stratiﬁed distance ABC SMC is introduced. The algorithm stratiﬁes particles based on their distance between the corresponding synthetic and observed data, and then constructs distinct proposal distributions for all the strata. T aking into account the distribution of distances across the particle space leads to substan- tially improved acceptance rate of the rejection sampling. It is shown that further efﬁciency could be gained by using a newly proposed stopping rule for the sequential process based on the stratiﬁed posterior samples and these advances are demonstrated by several examples. 1 Introduction Likelihood-free inference (LFI) methods have become standard tools for statistical inference when the likeli- hood functions are not available in closed form but we are able to simulate from the model given permissible parameter values of our choice [Cranmer et al., 2020]. Simulation-based inference is geared to be one of the cor- nerstones of many intelligence-based systems [Lavin et al., 2021]. Appr oximate Bayesian computation (ABC) methods were within the ﬁrst LFI methods and even the most basic rejection sampling ABC method can ar- guably still be relevant due to its simplicity and easy applicability when forward simulation from a model is possible [Pritchard et al., 1999, Beaumont et al., 2002, T anaka et al., 2006]. ABC methods continue to be actively developed and its advanced extensions are being successfully used in new applications as LFI is gaining pop- ularity in different resear ch ﬁelds [Sisson et al., 2018, Lintusaari et al., 2017, Martin et al., 2021, Pesonen et al., 2022, Engebretsen et al., 2023]. W ithin the ﬁrst extensions of ABC methods were sequential Monte Carlo ABC methods [Sisson et al., 2007, T oni et al., 2009, Moral et al., 2012]. They improve on the ef ﬁciency of r ejection sampling based ABC meth- ods by introducing a sequence of importance sampling distributions to improve acceptance rate of the pro- posed simulated samples. ABC SMC methods are actively developed and many proposals have been made to impr ove the accuracy and efﬁciency by intr oducing impr oved importance sampling distribution generation strategies or adaptive threshold selection [Silk et al., 2013, Filippi et al., 2013, Bonassi and W est, 2015, Simola et al., 2020]. Other strategies to improve sequential sampling is to incorporate either multilevel Monte Carlo strategies [Jasra et al., 2019] or delayed acceptance sampling [Everitt and Rowi ´ nska, 2021] wher e computation- ally faster lower level approximate models can be used to reduce the number of generated data points from the expensive accurate models. Diverging from the sampling based ABC methods, synthetic likelihood (SL) methods [W ood, 2010, Price et al., 2018] directly ﬁnd an estimate for the unknown likelihood function using a set of simulated samples. The development of both approaches, ABC and SL have been accelerated by the 1 introduction of powerful machine learning techniques. Gutmann and Corander [2016] introduced an efﬁcient Bayesian optimisation-based approach Bayesian optimisation for likelihood-free inference (BOLFI) to ﬁnd a surrogate for the discrepancy metric as a function of the unknown parameters. Lueckmann et al. [2018] and Papamakarios et al. [2019] introduced neural network-based ﬂexible models that can be used as a surrogates for the likelihood function or directly for the posterior distribution. In this study we introduce a new variant to the ABC SMC methods where we use the partitioning of the ABC acceptance regions to ﬁnd better proposal distributions for the importance sampling and improve the posterior approximation. In Section 2 we discuss ABC SMC methods and present the theory for optimal selection of particular type of propagation distributions. W e pr esent the new infer ence methodology in Section 3 with a novel early stopping rule in Section 4. In Section 5 we pr esent the r esults of the numerical experiments and conclude the article with discussion in Section 7. 2 Overview of ABC SMC methods ABC methods are based on appr oximations to the underlying models that enable us to draw a sample from an approximation to posterior distribution p ( θ | y obs ) . Instead of using the likelihood we draw a joint sample of the parameters θ i (called particles) and observations y i using the prior p ( θ ) and the simulator model f ( y | θ ) and retain N tuples ( θ i , y i , w i ) where the simulated observation is suf ﬁciently similar to the actual observation y obs . Similarity is determined via a discrepancy measure ρ ( y , y obs ) and an error tolerance threshold ϵ . Discrepancy measure and the threshold deﬁne an ABC acceptance region of simulated observations in the observation space. W e deﬁne ϵ 0 := 0 , ABC acceptance region as Ω [ ϵ a ,ϵ b ) := { y : ϵ a ≤ ρ ( y , y obs ) < ϵ b } and the ABC posterior as p ( θ | y obs ) ≈ p  θ | y ∈ Ω [0 ,ϵ )  = p ( y ∈ Ω [0 ,ϵ ) | θ ) p ( θ ) R p ( y ∈ Ω [0 ,ϵ ) | θ ) p ( θ ) d θ . (1) The proxy for the likelihood is p ( y ∈ Ω [0 ,ϵ ) | θ ) = Z f ( y | θ ) I  y ∈ Ω [0 ,ϵ )  d y , (2) where I ( · ) is the indicator function. Often we use summary statistics S ( y ) and S ( y obs ) instead of the original data, but here we assume that data can be either the non-transformed or summarised and will not consider the information loss related to the transformation. This basic ABC method is referred to as the r ejection ABC . Arguably the main problem with it is the po- tentially poor acceptance rate associated with reasonable thresholds ϵ which is generally caused by the use of prior as a proposal distribution. T o make ABC algorithm more efﬁcient, importance sampling based sequen- tial Monte Carlo techniques [Chopin and Papaspiliopoulos, 2020] have been applied in ABC. The resulting sequential Monte Carlo ABC (ABC SMC) techniques were introduced to produce a sequentially improving set of proposal distributions. T o improve the ABC posterior , the algorithm uses a decreasing sequence of accep- tance thresholds ϵ 1 , . . . , ϵ T and random walk proposal kernels K t ( θ ( t +1) | θ ( t ) ) . Kernels are used to propagate a set of weighted samples n θ ( t ) i , w ( t ) i , y ( t ) i o N i =1 that repr esent the current posterior distribution at iteration t to t + 1 . The ABC posterior sample of the current iteration t is often used to set the parameters of the kernel. Although other choices are possible, most often the kernels ar e zero-centered Gaussian distributions where the covariance matrices Σ t are empirically calculated fr om a set of weighted ABC posterior samples K t ( θ ( t +1) | θ ( t ) ) = No rmal  θ ( t +1) | θ ( t ) , Σ t  . (3) The ABC SMC joint proposal distribution for the tranformation t → t + 1 is of the form q t  θ ( t ) , θ ( t +1)  (4) = p  θ ( t ) | y ∈ Ω [0 ,ϵ t )  K t ( θ ( t +1) | θ ( t ) ) P  y ′ ∈ Ω [0 ,ϵ t +1 ) | θ ( t +1)  R R p  θ ( t ) | y ∈ Ω [0 ,ϵ t )  K t ( θ ( t +1) | θ ( t ) ) P  y ′ ∈ Ω [0 ,ϵ t +1 ) | θ ( t +1)  d θ ( t +1) d θ ( t ) . Filippi et al. [2013] derived an optimal Gaussian kernel (3) by minimizing the Kullback-Leibler (KL) dis- crepancy between (4) and q ′ t  θ ( t ) , θ ( t +1)  = p  θ ( t ) | y ∈ Ω [0 ,ϵ t )  p  θ ( t +1) | y ′ ∈ Ω [0 ,ϵ t +1 )  (5) 2 that is deﬁned as D KL ( q t || q ′ t ) = Z Z log q ′ t  θ ( t ) , θ ( t +1)  q t  θ ( t ) , θ ( t +1)  ! q ′ t  θ ( t ) , θ ( t +1)  d θ ( t ) d θ ( t +1) . (6) Given the assumptions (3) and (5) we can derive an optimal covariance for the Gaussian kernel Σ ( t ) = Z Z ( θ ( t ) − θ ( t +1) )( θ ( t ) − θ ( t +1) ) T × × p  θ ( t ) | y ∈ Ω [0 ,ϵ t )  p  θ ( t +1) | y ′ ∈ Ω [0 ,ϵ t +1 )  d θ ( t ) d θ ( t +1) , (7) that can be approximated using the ABC posterior sample by applying the threshold ϵ t +1 to select a re- weighted N ′ sized subset of it to repr esent p  θ ( t +1) | y ∈ Ω [0 ,ϵ t +1 )  n θ ( t +1) I ( i ) , e w ( t +1) I ( i ) , y ( t +1) I ( i ) o N ′ i =1 =      θ ( t ) I ( i ) , w ( t ) I ( i ) P N ′ j =1 w ( t ) I ( j ) , y ( t ) I ( i )   : y I ( i ) ∈ Ω [0 ,ϵ t +1 )    N ′ i =1 , (8) where the subset of indices { I ( i ) } N ′ i =1 ⊂ { 1 , 2 , . . . , N } indicate samples for which y I ( i ) ∈ Ω [0 ,ϵ t +1 ) . After obtaining the random walk proposal, we use it to generate candidates θ ′ by propagating randomly se- lected particles θ ( t ) i with sampling probabilities w ( t ) i from ABC posterior using the kernel θ ′ ∼ K t  θ ( t +1) | θ ( t ) i  and generating new synthetic observations y ′ ∼ p ( y | θ ′ ) . The new synthetic data are then compared against the observed data to determine whether or not it is within the acceptance regions of the next iteration y ′ ∈ Ω [0 ,ϵ t +1 ) and the sampling is repeated until a sample set of size N is obtained. The re-weigh ting follows the general importance sampling procedur e w ( t +1) i ∝ p  θ ( t +1) i  P N j =1 w ( t ) j K t  θ ( t +1) i | θ ( t ) j  , (9) where p ( · ) is the prior of the parameters and the importance sampling distribution P N j =1 w ( t ) j K t  θ ( t +1) | θ ( t ) j  is empirically estimated from the weighted sample as a Gaussian mixtur e. Filippi et al. [2013] introduced a locally optimal Gaussian kernel with analogous covariance to (7) by opti- mising for the covariance for each particle θ ( t ) i separately , i.e. setting effectively p  θ | y ∈ Ω [0 ,ϵ t )  = δ ( θ − θ ( t ) i ) , where δ ( · ) is the delta function. This leads to a kernel covariance Σ ( t ) i = Z ( θ ( t ) i − θ ( t +1) )( θ ( t ) i − θ ( t +1) ) T p  θ ( t +1) | y ∈ Ω [0 ,ϵ t +1 )  d θ ( t +1) , (10) which can again be approximated based on the sample (8). 3 Stratiﬁed sampling ABC SMC In most cases ABC SMC is superior to the basic rejection ABC algorithm, especially when the prior distribution of the parameters is far off from the posterior distribution. There are multiple approaches for improving the standard ABC SMC, by choosing differ ent proposal and tar get distributions, determining the threshold sequence adaptively or setting rules for stopping the sequential process. Especially in the early iterations of the algorithm, the kernels calculated empirically from the data can have large scale parameters which in turn can result in slower convergence to the appr oximate posterior distribution that is within our tolerance of err or . W e can investigate ABC SMC more closely by looking at how the particles are accepted and how the weights for the particles are being formed. At t th iteration, ABC posterior sample approximation is n θ ( t ) i , w ( t ) i , y ( t ) i o N i =1 . By using ABC SMC algorithm that was described in Section 2, the posterior approximation at ( t + 1) th iteration will be n θ ( t +1) i , w ( t +1) i , y ( t +1) i o N ( t +1) tot i =1 , wher e θ ( t +1) i | θ ( t ) j i.i.d ∼ P N j =1 w ( t ) j K t ( θ ( t +1) | θ ( t ) j ) , N ( t +1) tot ≥ N is the total number of simulated samples at ( t + 1) th iteration, and w ( t +1) i ∝      p  θ ( t +1) i  P N j =1 w ( t ) j K t  θ ( t +1) i | θ ( t ) j  , if y ( t +1) i ∈ Ω [0 ,ϵ t +1 ) , h w .p. P  y ∈ Ω [0 ,ϵ t +1 ) | θ ( t +1) i i 0 , if y ( t +1) i / ∈ Ω [0 ,ϵ t +1 ) , h w .p. P  y / ∈ Ω [0 ,ϵ t +1 ) | θ ( t +1) i i . (11) 3 Samples are generated until N samples have non-zer o weights, so it would be computationally desirable to generate as many samples as possible so that y ( t +1) i ∈ Ω [0 ,ϵ t +1 ) , and achieve a large acceptance rate N / N ( t +1) tot . If P  y ∈ Ω [0 ,ϵ t +1 ) | θ  would be available, it would be possible to query mor e samples from the parts of the parameter space that generate non-zero weighted samples with higher probability . Unfortunately this is in practice impossible in any r ealistic application of ABC. The average acceptance pr obability at ( t + 1) th iteration can be expressed as Z Z P  y ′ ∈ Ω [0 ,ϵ t +1 ) | θ ( t +1)  K t ( θ ( t +1) | θ ( t ) ) p  θ ( t ) | y ∈ Ω [0 ,ϵ t )  d θ ( t ) d θ ( t +1) , (12) which we aim to improve especially at the early iteration rounds t by intr oducing a novel importance sampling approach that modiﬁes the locally optimal ABC SMC deﬁned by the kernel covariance (10) using a partition of the ABC posterior appr oximation into components deﬁned by a set of non-overlapping acceptance regions. The observation space bounded to the acceptance region at threshold ϵ t of ABC SMC can be partitioned based on the threshold sequence Ω [0 ,ϵ t ) = T [ k = t Ω [ ϵ k +1 ,ϵ k ) , (13) where ϵ T +1 := 0 . W e can use the partitioning and the law of total probability to write the ABC posterior correspondingly as p  θ ( t ) | y ∈ Ω [0 ,ϵ t )  = T X k = t p ( θ ( t ) | y ′ ∈ Ω [ ϵ k +1 ,ϵ k ) ) P  y ′ ∈ Ω [ ϵ k +1 ,ϵ k ) | y ∈ Ω [0 ,ϵ t )  . (14) Example 1. W e deﬁne a generative model for a single observation from a Gaussian model with mean θ and variance 1 with uniform prior for θ p ( y | θ ) = N ( y | θ , 1) , p ( θ ) = Unif [ − 6 , 6] ( θ ) (15) and observe y obs = 0 . Distance measure is the absolute difference between synthetic and observed data. Figure 1 illustrates the initial situation where an acceptance threshold sequence ϵ 1:5 is selected as [ ∞ , 4 , 3 , 2 , 1] . The ﬁrst iteration ABC posterior corresponds to the prior and the thr eshold ϵ 1 = ∞ . Figure 1: Example 1. Partitioning of the ABC posterior distribution based on the acceptance regions. Here the posterior is actually the prior as the acceptance region Ω 0 , ∞ is the whole observation space. In the ﬁrst round of ABC SMC we’d wish to transfer particles p ( θ | y ∈ Ω 0 , ∞ ) into p ( θ | y ∈ Ω 0 , 4 ) that is a combination of the four other posterior components. The partitioned posterior motivates the proposed ABC SMC approach. W e propose a modiﬁcation to the locally optimal Gaussian transition kernel by removing the dependence on iteration t , and instead deﬁne it for each posterior component conditioned on the events of observations belonging to the acceptance regions 4 Ω [ ϵ k +1 ,ϵ k ) . Given the set of predeﬁned acceptance regions and the ABC posterior sample at i th iteration, we ﬁrst deﬁne a locally optimal transition kernel similar to (10) by taking into account which partitioned acceptance region band a particle resides in and then balance the particle selection probability by using approximations to the weights in (14) that we get as a by-product of pr evious iterations of ABC SMC. Let ( θ ( t ) i , y ( t ) i , w ( t ) i ) ∈  ( θ ( t ) , y ( t ) , w ( t ) ) | y ( t ) ∈ Ω [ ϵ k +1 ,ϵ k )  be a tuple that deﬁnes a particle from ABC poste- rior at iteration t ≤ k < T , that is within a partitioned acceptance region band Ω [ ϵ k +1 ,ϵ k ) . W e seek a transition kernel K k ( θ ( t +1) | θ ( t ) i ) = No rmal  θ ( t +1) | θ ( t ) , Σ ( k ) i  (16) where covariance Σ ( k ) i is optimal for mutating the particle θ ( t ) i associated with Ω [ ϵ k +1 ,ϵ k ) so that it would pro- duce an observation from acceptance region Ω [0 ,ϵ k +1 ) . The idea here is that the particles from each iteration t that are associated with acceptance region Ω [ ϵ k +1 ,ϵ k ) would the transferred to a posterior that is deﬁned by Ω [0 ,ϵ k +1 ) that is one step closer to the ﬁnal acceptance r egion Ω [0 ,ϵ T ) as deﬁned by the ﬁxed thr eshold sequence. T raditionally , all particles would be transferred to approximate a posterior deﬁned by Ω [0 ,ϵ t +1 ) , regar dless of the associated values y ( t ) i . The pr oposed modiﬁcation can be achieved using the same approach as deriving locally optimal transition kernel covariance (10). Now proposed ABC SMC joint proposal distribution for the tranformation at t th iteration is of the form q k  θ ( t ) , θ ( t +1)  (17) = p  θ ( t ) | y ∈ Ω [ ϵ k +1 ,ϵ k )  K k ( θ ( t +1) | θ ( t ) ) P  y ′ ∈ Ω [0 ,ϵ k +1 ) | θ ( t +1)  R R p  θ ( t ) | y ∈ Ω [ ϵ k +1 ,ϵ k )  K k ( θ ( t +1) | θ ( t ) ) P  y ′ ∈ Ω [0 ,ϵ k +1 ) | θ ( t +1)  d θ ( t +1) d θ ( t ) . As before, we derive an optimal Gaussian kernel (3) by minimizing the Kullback-Leibler (KL) discrepancy between (4) and q ′ k  θ ( t ) , θ ( t +1)  = p  θ ( t ) | y ∈ Ω [ ϵ k +1 ,ϵ k )  p  θ ( t +1) | y ′ ∈ Ω [0 ,ϵ k +1 )  , (18) which leads to Σ ( k ) = Z Z ( θ ( t ) − θ ( t +1) )( θ ( t ) − θ ( t +1) ) T × × p  θ ( t ) | y ∈ Ω [ ϵ k +1 ,ϵ k )  p  θ ( t +1) | y ′ ∈ Ω [0 ,ϵ k +1 )  d θ ( t ) d θ ( t +1) , (19) for which a locally optimal version at θ ( t ) i is Σ ( k ) i = Z  θ ( t ) i − θ ( t +1)   θ ( t ) i − θ ( t +1)  T p  θ ( t +1) | y ∈ Ω [0 ,ϵ k +1 )  d θ , (20) where θ ( t ) i is such that the associated observation y ( t ) i ∈ Ω [ ϵ k +1 ,ϵ k ) . In Figur e 2 we illustrate the cascading nature of the ABC posterior approximations at a single iteration t of the methods and how the transition kernels are locally optimally deﬁned for each of the particles separately based on posterior component they belong to. Figure 2: Example 1 continued. The particles residing in differ ent partitioned components of the ABC posterior are being transitioned with a kernel deﬁned by the cascading structur e of impr oving ABC posterior estimates. 5 W e express the average acceptance rate at the transition from r ound t → t + 1 (12) as Z Z T X l = t +1 T X k = t P  y ′ ∈ Ω [ ϵ l +1 ,ϵ l ) | θ ( t +1)  K k ( θ ( t +1) | θ ( t ) ) p  θ ( t ) | y ∈ Ω [ ϵ k +1 ,ϵ k )  d θ ( t ) d θ ( t +1) = T X l = t +1 T X k = t Z Z P  y ′ ∈ Ω [ ϵ l +1 ,ϵ l ) | θ ( t +1)  K k ( θ ( t +1) | θ ( t ) ) p  θ ( t ) | y ∈ Ω [ ϵ k +1 ,ϵ k )  d θ ( t ) d θ ( t +1) = T X l = t +1 T X k = t C ( t ) l,k , (21) where C ( t ) l,k is the average rate of transitioning a particle from posterior p ( θ ( t ) | y ∈ Ω [ ϵ k +1 ,ϵ k ) ) with kernel K k ( · | · ) and generating a synthetic observation y ∈ Ω [ ϵ l +1 ,ϵ l ) . Given that we are appr oximating C ( t ) l,k ≈ C l,k , l = t + 1 , . . . , T , k = t, . . . , T at each iteration of the ABC SMC, we can use these regional weights as a basis for a stratiﬁcation strategy to compose an importance sampling distribution for more efﬁcient simulation accep- tance, by using the information we have obtained about the different different partitioned posterior compo- nents and their average acceptance rates C l,k . In practice we appr oximate C l,k by counting the fr equencies f ( t ) l,k how often samples drawn from p ( θ ( t +1 ,t ) | y ∈ Ω [ ϵ k +1 ,ϵ k ) ) := Z K k ( θ ( t +1) | θ ( t ) ) p ( θ ( t ) | y ∈ Ω [ ϵ k +1 ,ϵ k ) ) d θ ( t ) generate observations y ∈ Ω [ ϵ l +1 ,ϵ l ) . This is carried out as a part of the weighted ABC sampling at each iteration. The distribution of the posterior pr edictive observations from the posterior at t th iteration that we will denote here as y ( t +1 ,t ) is p ( y ( t +1 ,t ) | y ∈ Ω [ ϵ k +1 ,ϵ k ) ) := Z p ( y ( t +1 ,t ) | θ ( t +1 ,t ) ) p ( θ ( t +1 ,t ) | y ∈ Ω [ ϵ k +1 ,ϵ k ) ) d θ ( t +1 ,t ) can be estimated with the probability mass distribution of the discretised events y ′ ∈ Ω [ ϵ l +1 ,ϵ l ) | y ∈ Ω [ ϵ k +1 ,ϵ k ) . This is approximated using the fr equencies b C ( t ) l,k = t X m =1 f ( m ) l,k P t m ′ =1 P T l ′ =1 f ( m ′ ) l ′ ,k , l = 1 , . . . , T , k = t, . . . , T . (22) The full weight for iteration t is obtained by ﬁrst taking the sum over different acceptance regions that are within Ω [0 ,ϵ t +1 ) c W ( t ) k = T X l = t +1 b C ( t ) l,k = T X l = t +1 t X m =1 f ( m ) l,k P t m ′ =1 P T l ′ =1 f ( m ′ ) l ′ ,k , k = t, . . . , T . (23) W e illustrate the cumulative predictive pr obability weighting in Figure 3. Cumulative predictive probabil- ities improve along with the iterations as we obtain realisations of posterior predictive observations from differ ent posterior components. W e wish to re-balance the importance sampling distribution by giving more weight to particles that have been observed to generate samples within acceptance regions that will be in- cluded in the subsequent iterations of the algorithm. At iteration t , our approximation to the posterior impor- tance sampling distribution can be constructed from the posterior distribution { ( θ ( t ) i , y ( t ) i , w ( t ) i , I ( t ) i } N i =1 , where I ( t ) i ∈ { 1 , . . . , T } indicates the corresponding partition of the observation space. For example, if I ( t ) i = k , then y ( t ) i ∈ Ω [ ϵ k +1 ,k ) . The empirical importance sampling distribution is { ( θ ( t ) i , y ( t ) i , ˆ w ( t ) i , I ( t ) i ) } N i =1 , where ˆ w ( t ) i is the re-balanced weight. It is calculated using the stratiﬁed observation space by ﬁrst normalizing the weights within each strata, then weighting them with (23) and ﬁnally re-normalising the whole set, as ˆ w ( t ) i ∝ w ( t ) i P j : I ( t ) j = I ( t ) i w ( t ) j · c W ( t ) I ( t ) i . (24) Based on the re-balanced weights we draw on average c W ( t ) k of the samples from { ( θ ( t ) i , y ( t ) i , w ( t ) i , I ( t ) i ) : I ( t ) i = k } N i =1 because the pr edictive distribution conditioned on y ∈ Ω [ ϵ k +1 ,ϵ k ) have been appr oximated empirically to generate c W ( t ) k of the samples within Ω [0 ,ϵ t +1 ) . This stratiﬁed sampling approach should accelerate the accep- tance rate especially in the early iterations of ABC SMC when the transition kernels can have lar ge covariances. 6 Figure 3: Example 1 continued. After the ﬁrst ABC SMC iteration the particles ar e mutated and new set of synthetic observations are generated. W e calculate the probabilities for each posterior component of generating observations in each of the acceptance regions. These probabilities are cumulatively improved along with the iterations. In this case we observe that as expected, posterior component r elated to Ω 0 , 1 has the largest probabilities of generating observations that will be accepted in any round. These component-wise predictive probabilities ar e then used to re-weight the ABC posterior sample weights. The largest weights for the samples calculated via (9) are assigned to the samples from the tails of the impor- tance sampling distribution, when the corr esponding synthetic observations ar e within the acceptance r egion. However , these outlying samples with large weights may also result in lower acceptance rate on the subse- quent iterations. W e can estimate the locally optimal covariance kernel (20) based on the weighted sample by ˆ Σ ( k ) i = N ′ X j =1 ˆ ω ( t ) I ( j )  θ ( t ) i − θ ( t ) I ( j )   θ ( t ) i − θ ( t ) I ( j )  T , where θ ( t ) i is associated to acceptance r egion Ω [ ϵ k +1 ,ϵ k ) . The index set { I (1) , . . . , I ( N ′ ) } ⊂ { 1 , . . . , N } is such that I ( t ) I ( i ) ≥ k + 1 , and the re-normalised weights are calculated as ˆ ω ( t +1) I ( i ) = ˆ w ( t ) I ( i ) P N ′ i =1 ˆ w ( t ) I ( i ) . (25) The proposed algorithm is summarised for a single iteration of ABC SMC in Algorithm 1. 4 Stopping rule ABC SMC is an iterative sampling algorithm that will require more computational r esources after each iter- ation as the acceptance thr eshold is decreased, either adaptively or following a predetermined schedule, as in the case of the approach presented here. However , after some point the improvements we are obtaining from decreasing the threshold may not be signiﬁcant enough to justify the requir ed resources for improving the approximation. W e propose a novel stopping rule based on the evolution of the prediction probabilities 7 Algorithm 1 Empirical estimation of posterior predictive pr obability of acceptance events 1: Input 1: Posterior sample { ( θ ( t ) i , y ( t ) i , w ( t ) i , I ( t ) i ) } N i =1 . 2: Input 2: Frequencies f ( m ) l,k , m = 1 , . . . , t, k = 1 , . . . , T , l = 1 , . . . , T as in (22). 3: Calculate importance sampling distribution { ( θ ( t ) i , y ( t ) i , ˆ w ( t ) i , I ( t ) i ) } N i =1 using (24) 4: Set n = 1 5: Set f ( t +1) l,k = 0 , k = 1 , . . . , T , l = 1 , . . . , T 6: while n ≤ N do 7: Select ( θ ( t ) , y ( t ) , I ( t ) ) from { ( θ ( t ) i , y ( t ) i , b w ( t ) i , I ( t ) i ) } N i =1 with probability b w ( t ) i 8: Generate θ ′ ∼ K I ( t ) ( θ | θ ( t ) ) 9: Generate y ( t +1 ,t ) ∼ f ( y | θ ′ ) 10: Find k such that y ( t +1 ,t ) ∈ Ω [ ϵ k +1 ,ϵ k ) 11: Set f ( t +1) k,I ( t ) = f ( t +1) k,I ( t ) + 1 12: if y ( t +1 ,t ) ∈ Ω [0 ,ϵ t +1 ) then 13: θ ( t +1) n = θ ′ 14: y ( t +1) n = y ( t +1 ,t ) 15: I ( t +1) n = k 16: Calculate w ( t ) n ∝ p ( θ n ) / P N i =1 b w ( t ) i K I ( t ) ( θ ′ | θ ( t ) i ) 17: n = n + 1 18: end if 19: end while 20: Normalise w ( t ) n = w ( t ) n / P N i =1 w ( t ) i 21: Output 1: Posterior sample { ( θ ( t +1) i , w ( t +1) i , y ( t +1) i , I ( t +1) i ) } N i =1 22: Output 2: Frequencies f ( m ) l,k , m = 1 , . . . , t + 1 , k = 1 , . . . , T , l = 1 , . . . , T C ( t ) l,k . The prediction probabilities of the partitioned acceptance r egions can be expressed as a probability mass function deﬁned by the probabilities P  y ( t ) ∈ Ω [ ϵ l +1 ,ϵ l ) | y ∈ Ω [ ϵ k +1 ,ϵ k )  = C ( t ) l,k , l = 1 , . . . , T . (26) The tar get of the ABC SMC is the approximate posterior at the ﬁnal iteration T and we use the pr obability mass C ( t ) : ,T as a proxy to monitor the evolution of the approximate posterior . W e use notation C ( t ) : ,k to denote the probability mass function of the acceptance region events. If at the iteration t the approximate b C ( t ) : ,t is already resembling b C ( t ) : ,T , we can consider stopping the algorithm early . Computationally this convenient as we can count the frequencies (22) as part of the proposed Algorithm 1. W e can use the Kullback-Leibler (KL) divergence D KL  C ( t ) : ,T || C ( t ) : ,t  = T X l =1 C ( t ) l,T log C ( t ) l,T C ( t ) l,t ! (27) as the quantiﬁed differ ence between the target and the current state. Note that by the deﬁnition we can start monitoring the conver gence only after the second iteration as all the parameters on the ﬁrst round are from the prior instead of the components of the partitioned approximate posterior . It is also notable that the early stop- ping rule monitors whether we have achieved posterior pr edictive performance of the ﬁnal iteration round. Therefor e if the ﬁnal threshold is very small, it is possible that our empirical approximation of C ( t ) : ,T is poor , or in the worst case, does not contain a single sample. It is recommended to set up a minimum number of sam- ples that are r equired for appr oximating C ( t ) : ,T , and only after it is achieved, the early stopping rule monitoring is activated. Example 2. We illustrate the behaviour of the KL divergence using the model from Example 1 and augmenting the model with more acceptance regions that ar e deﬁned by thr eshold sequence ϵ 1:9 = [ ∞ , 4 , 3 , 2 , 1 , 0 . 8 , 0 . 6 , 0 . 4 , 0 . 2] . The experiment is carried out with N = 150000 samples. We calculate the posterior predictive probabilities of the acceptance regions using the cumulative frequencies and for each of the iterations we calculate the KL diver gence between the consec- utive probability mass functions. The evolution of KL is illustrated in Figure 4. We see how the KL divergence plateaus after the 5 th iteration as the predicted pr obability mass function does not signiﬁcantly change even when the threshold of the ABC posterior decreases. 8 Figure 4: Example 1 continued. Upper ﬁgur e bars illustrate the predicted pr obability mass functions (pmf) (26). At each iteration we calculate pmf deﬁned by (26) and the corresponding KL divergence (27) that is reported on the bottom ﬁgur e. 5 Results W e compare the proposed stratiﬁed sampling ABC SMC approach to a globally and locally optimal ABC SMC methods introduced in Section 2 denoted by Global and Local , respectively . T wo version of the stratiﬁed sampling appr oach are tested here. Stratiﬁed simple uses only the new transition kernel, wher eas Stratiﬁed used the transition kernel and the proposed importance sampling distribution. In each test case, we have ﬁxed the unknown parameters θ 0 and repeated the experiment 50 times with differ ent randomly generated observations y obs ∼ p ( y | θ 0 ) . W e assume that accuracy-wise all the methods will perform similarly as the main differ ence of the meth- ods is the sampling strategy . W e investigate the acceptance rates, and estimation accuracies in three examples which have been widely used in ABC literature for benchmarking methods. In addition, we investigate the performance of the proposed method when solving a real problem of inferring transmission dynamics of Strep- tococcus pneumoniae in day care centers. As estimation accuracy we r eport the medians of iteration-wise sample means in addition to the interval: median of the sample mean ± 1 . 96 · median of the sample standard deviation. In addition to accuracy we report the KL-divergence of the predicted acceptance r egion probabilities to inves- tigate its usability as an early termination indicator . Acceptance rates, total number of generated samples and KL convergence monitoring quantity ar e reported as medians and inter quartile ranges over the r epetitions. 5.1 Banana shaped distribution The ﬁrst example is a two-parameter banana shaped posterior distribution, where the parameter dependence structur e is a challenge for the sampling methods. The model was intr oduced by Haario et al. [1999] and used by e.g. Filippi et al. [2013] as a benchmarking example for ABC SMC. The likelihood and the prior ar e deﬁned as p ( y | θ 1 , θ 2 ) = No rmal  θ 1 θ 1 + θ 2 2  ,  1 0 0 0 . 5  (28) p ( θ 1 , θ 2 ) = Unif [ − 50 , 50] ( θ 1 ) · Unif [ − 50 , 50] ( θ 2 ) . (29) As divergence function we use Euclidean distance. Methods use the thr eshold sequence [ ∞ , 100 , 50 , 20 , 10 , 5 , 2 , 1] , and at each iteration the posteriors are approximated with 2000 samples. Acceptance rates provided by the pr oposed methods ar e signiﬁcantly impr oved compar ed to local and global sampling strategies as seen 9 Figure 5: Acceptance rates by iteration round for the four compared methods in the three experiments with 2, 3 and 4 parameters as described in Sections 5.1 , 5.2 and 5.3. Acceptance rates are consistently higher for the pr oposed method in all experiments. The beneﬁt of impr oved acceptance rate cumulates over the iteration rounds. Figure 6: T otal cumulative count of generated samples by iteration round for the four compared methods in the three experiments with 2, 3 and 4 parameters as described in Sections 5.1 , 5.2 and 5.3. The total cumulative number of samples indicate that the new stratiﬁed sampling method is more ef fective than the locally optimal ABC SMC. In banana shaped and G-and-k distribution examples, the locally optimal ABC SMC has reached sixth iteration, whereas the new proposed method is already on the ﬁnal iteration round. On Lotka-V olterra model, when newly pr oposed method has reached the ﬁnal iteration r ound, other methods have spent at least the same number of samples to reach the thir d iteration. in Figure 5. Even using only the proposed novel transition kernel (Stratiﬁed simple) the acceptance rate is better , but the stratiﬁed sampling strategy does improve the results markedly over almost all iterations. The estimation accuracy is slightly improved with the pr oposed methods and early termination conver gence mon- itoring does indicate that the ﬁnal iteration could be unnecessary . It is notable that the KL might be unreliable in the early iterations of the algorithm as indicated by the initial increase in the monitored divergence. The cumulative number of generated samples to achieve the requir ed amount of samples is reported in Figure 6. Assuming the we would terminate the execution at the seventh r ound, we observe that e.g. the locally optimal ABC SMC has reached only the fourth iteration round given the same simulation budget. If we compare the estimation err ors r eported in Figur e 7 on fourth and seventh iteration rounds of locally optimal ABC SMC and stratiﬁed sampling ABC SMC, respectively , we notice that the proposed method is much mor e efﬁcient. 5.2 Lotka-V olterra Lotka-V olterra model is a stochastic kinetic system describing predator -prey dynamics. W e adapt the imple- mentation from Owen et al. [2015]. The system is deﬁned by three reactions R1, R2 and R3 Prey birth (R1): X 1 → 2 X 1 (30) Prey death & Pr edator birth (R2): X 1 + X 2 → 2 X 2 (31) Predator death (R3): X 2 → 0 . (32) W e simplify our model by ﬁxing the initial numbers of prey and predator populations to 100 and 50 , respec- tively , and not adding observation noise to the state. The prior distributions for the hazard rates r i , i = 1 , 2 , 3 of the reactions ar e set as log( r i ) ∼ Unif ( − 6 , 1) , i = 1 , 2 , 3 (33) 10 Figure 7: Evolution of the sample mean and 1 . 96 · standard deviation and the early termination convergence diagnostic over 8 iterations in Banana-shaped distribution example. T rue parameter values used to simulate all the datasets are marked with dashed lines ( θ 1 = θ 2 = 0 ). W e use summarised data as observations and Euclidean distance function. The summaries are deﬁned as S 1 = Mean of prey population over the simulation S 2 = Mean of predator population over the simulation S 3 = Logarithm of variance of prey population over the simulation S 4 = Logarithm of variance of predator population over the simulation S 5 = Auto-correlation of lag 1 of prey population over the simulation S 6 = Auto-correlation of lag 1 of predator population over the simulation S 7 = Auto-correlation of lag 2 of prey population over the simulation S 8 = Auto-correlation of lag 2 of predator population over the simulation All methods use a threshold sequence [ ∞ , 200 , 100 , 90 , 80 , 70 , 60 , 50] , and at each iteration the posteriors are approximated with 2000 samples. Acceptance rates are again signiﬁcantly better with the proposed methods as reported in Figure 5. The estimation accuracy is better with the proposed methods, but the algorithms have not yet reached a stable state, but all methods should have improved performance when the threshold sequence would be continued as can be seen in Figure 8. The total number of generated samples is less than a half with the stratiﬁed sampling ABC SMC than with the other methods on the ﬁnal iteration round as r eported on Figure 6. Figure 8: Evolution of the sample mean and 1 . 96 · standard deviation and the early termination convergence diagnostic over 8 iterations in Lotka-V olterra example. T rue parameter values used to simulate all observed datasets are marked with dashed horizontal lines ( r 1 = 2 , r 2 = 0 . 01 , r 3 = 1 ). 11 5.3 g-and-k-distribution The thir d test case we investigate is the univariate g-and-k distribution following [Drovandi and Pettitt, 2011]. W e can draw a sample from the distribution by ﬁrst sampling z ( p ) i i.i.d ∼ Normal (0 , 1) , i = 1 , . . . , 50 and then drawing y i for each i = 1 , . . . , 50 y i ∼ Q gk ( z ( p ) i ; A, B , g , k ) = A + B ·  1 + 0 . 8 · 1 − exp( − g · z ( p ) i ) 1 + exp( − g · z ( p ) i )  (1 + z ( p ) 2 i ) k z ( p ) i . (34) W e use uniform prior distributions for the parameters A, B , g i.i.d ∼ Unif (0 , 5) k ∼ Unif (0 , 2) , and as the summary statistic we use the order ed sequence of observations [ y I (1) , y I (2) , . . . , y I (50) ] , i < j ⇒ y I ( i ) ≤ y I ( j ) . Euclidean distance is used as the divergence metric. All methods use a threshold sequence [ ∞ , 100 , 70 , 50 , 30 , 27 , 23 , 20] , and at each iteration the posteriors are appr oximated with 5000 samples. W e observe similar r esults than on the previous experiments. Acceptance rates ar e signiﬁcantly improved given the new pr oposed meth- ods as illustrated in the Figure 5. From Figure 9 we see that the estimation accuracy of the proposed method is a little better compared to local and global. The early termination quantity does not indicate convergence within the iterations which can also be seen the posterior mean estimate evolution. In Figure 6 we see that the stratiﬁed method is a couple of iterations ahead simulation-budget-wise to locally optimal ABC SMC. Figure 9: Evolution of the sample mean and 1 . 96 · standard deviation and the early termination convergence diagnostic over 8 iterations in g-and-k distribution example. T rue parameter values used to simulate all ob- served datasets are marked with dashed horizontal lines ( A = 3 , B = 1 , g = 2 , k = 0 . 5 ). 6 Bacterial transmissions in day care centers W e investigate a pr oblem of estimating the transmission dynamics of Streptococcus pneumoniae fr om strain prevalence data collected in day care units in Oslo metropolitan area. Originally , the transmission model pa- rameters were solved using ABC SMC in the study by Numminen et al. [2013]. Our formulation of the problem follows a slightly modiﬁed model and the set of summary statistics following [Gutmann and Corander, 2016]. W e use a model where the number of the observed children equals the size of the center , i.e. we don’t simulate 12 unobserved children. The data described in detail in V estrheim et al. [2008] consists of samples collected from 611 healthy children from 29 different day care centers. The 33 most prevalent strains were included in the data set. The transmission dynamics are modeled separately for each day care center as a continuous time Markov process P ( I is ( t + δt ) = 1 | I is ( t ) = 0) = β E s ( I ( t )) + Λ P s + o ( δ t ) , if N s X j =1 I ij ( t ) = 0 , (35) P ( I is ( t + δt ) = 1 | I is ( t ) = 0) = θ ( β E s ( I ( t )) + Λ P s + o ( δ t )) , if N s X j =1 I ij ( t ) > 0 , I is = 0 (36) P ( I is ( t + δt ) = 0 | I is ( t ) = 1) = γ + o ( δ t ) , (37) where β ∼ Unif (0 , 11) , Λ ∼ Unif (0 , 2) and θ ∼ Unif (0 , 1) are the internal infection parameter , the external infection parameter and the co-infection parameter , respectively . The ﬁrst equation deﬁnes the probability for a child i to be colonised with strain s when the individual is not colonised with any of the other strains in the model, while the second equation described the probability when the child is already being colonised with at least one other strain. The last equation describes the recovery rate from colonization. W e follow the formulation in Numminen et al. [2013] and set γ = 1 , and all other variables are scaled accordingly . The data is a single time point snapshot of the colonizations. W e set the simulation time to T = 10 time units, and use the state in the end as our observation. The resulting data consists of N dcc binary matrices I ( n ) is , i = 1 , . . . , N c, dcc , s = 1 , . . . , N s , n = 1 , . . . , N dcc , wher e N c, dcc is the number of children in the day care center n , N s is the number of bacterial strains and N dcc is the number of day care centers. W e use a set of four summary statistics for each of the day care centers, i.e. Shannon index of of the strains, the number of differ ent strains, the prevalence of carriage among children and the pr evalence of more than one colonization among the children. The summarised data of size 4 from 29 day care centers was ﬁrst scaled based on the observed data so that the maximum of each of the summaries would be one. After the summary statistic scaling, the empirical cdfs of the joint set of 116 summaries was then compared via L 1 -norm scaled with the coefﬁcient 1/116. The two versions of the stratiﬁed sampling approach are tested along with the locally optimal ABC SMC to infer the model parameters with 5000 approximate posterior samples using a threshold sequence [ ∞ , 0 . 5 , 0 . 1 , 0 . 08 , 0 . 05] . The r esults illustrated in Figure 10 indicate that there are no signiﬁcant differences in the inferred approximate posterior . Using the proposed algorithm, we do obtain the ﬁnal posteriors by using less than half of the sim- ulations when comparing the stratiﬁed and the locally optimal versions in Figure 11. There is a discrepancy between the posterior inferred by Numminen et al. [2013] and our approximate posteriors in the estimate of the parameter Λ that is most likely caused by the differ ence in the used models, as described earlier in the section. 7 Conclusions Here we introduced an approach to improve the acceptance rate of ABC SMC using a stratiﬁcation of the ac- ceptance region. Using the accumulated information of the posterior predictive distributions, we can re-weight the importance sampling distribution of the ABC SMC to draw particles that ar e mor e likely to pr oduce obser - vations that will be accepted at the subsequent iterations of the algorithm. In addition we described a transi- tion kernel that takes the cascading formulation of the algorithm into account, and a novel early termination diagnostic based on the KL divergence of the pr edictive distribution of the acceptance regions. Our simulations conﬁrm feasibility of the proposed method and demonstrate that it achieves signiﬁcantly improved acceptance rate than the currently most popular versions of the ABC SMC algorithm. Even if there are no large differ ences between the inference methods in estimation accuracy when considering speciﬁc it- eration rounds, the improved acceptance rate means that the newly proposed method achieves the accuracy levels earlier . A restriction of the method, however , is the requirement of determining the threshold sequence beforehand. In practice this means that it is necessary to obtain pr oblem domain speciﬁc knowledge about the behavior of the discrepancy measure using explorative analyses of the simulator output with feasible inputs. Of note, many such pre-inference analyses would also match the good practices from the traditional Bayesian workﬂow [Gelman et al., 2020]. In situations where the a priori analyses would not be considered feasible, it would be necessary to r esort to adaptive algorithms not necessitating pr e-speciﬁed thresholds. Many adaptive ABC SMC algorithms have been designed to effectively skip unnecessary evaluations of ABC posteriors by ﬁnding the threshold values based on the current posterior sample. Even if these methods are not designed to impr ove the acceptance rate, they can achieve good performance with less total evaluations by skipping 13 Figure 10: Evolution of the sample mean and the 95% -credible interval of the three parameters of the day care center model described in Section 6. The horizontal dashed lines are the sample mean and 95% -credible intervals as reported in Numminen et al. [2013]. Figure 11: Acceptance rates and the total number of generated samples by iteration round for the three compared methods in the day care center model described in Section 6. Acceptance rates are consistently higher for the proposed method in all experiments. The beneﬁt of improved acceptance rate cumulates over the iteration rounds, and eventually the new proposed method requir es only half of the simulator queries to reach the ﬁnal appr oximate posterior . 14 unnecessary steps when using ﬁxed thr eshold sequencing. Further work could thus be done to combine mor e efﬁcient sampling with adaptation of the acceptance r egions to the data. Acknowledgements This resear ch was supported by Research Council of Norway , grant no. 299941 and through its Centre of Excellence Integreat - The Norwegian Centr e for knowledge-driven machine learning, project number 332645. References M. A. Beaumont, W . Zhang, and D. J. Balding. Appr oximate bayesian computation in population genetics. Genetics , 162(4):2025–2035, 2002. F . V . Bonassi and M. W est. Sequential monte carlo with adaptive weights for approximate bayesian computa- tion. Bayesian Analysis , 10(1):171–187, 2015. doi: 10.1214/14- BA891. N. Chopin and O. Papaspiliopoulos. An Introduction to Sequential Monte Carlo . Springer Series in Statistics. Springer Cham, 2020. ISBN 9783030478452. K. Cranmer , J. Brehmer , and G. Louppe. The frontier of simulation-based inference. Proceedings of the National Academy of Sciences , 117(48):30055–30062, 2020. ISSN 0027-8424. doi: 10.1073/pnas.1912789117. C. C. Drovandi and A. N. Pettitt. Likelihood-free bayesian estimation of multivariate quantile distributions. Computational Statistics and Data Analysis , 55:2541–2556, 2011. S. Engebretsen, A. Diz-Lois Palomares, G. Rø, A. B. Kristoffersen, J. C. Lindstrøm, K. Engø-Monsen, M. Kami- neni, L. Y . Hin Chan, Ø. Dale, J. E. Midtbø, K. L. Stenerud, F . Di Ruscio, R. White, A. Frigessi, and B. F . de Blasio. A real-time regional model for covid-19: Probabilistic situational awar eness and forecasting. PLOS Computational Biology , 19(1):1–26, 2023. doi: 10.1371/journal.pcbi.1010860. R. G. Everitt and P . A. Rowi ´ nska. Delayed acceptance abc-smc. Journal of Computational and Graphical Statistics , 30(1):55–66, 2021. doi: 10.1080/10618600.2020.1775617. S. Filippi, C. P . Barnes, J. Cornebise, and M. P . Stumpf. On optimality of kernels for approximate bayesian computation using sequential monte carlo. Statistical Applications in Genetics and Molecular Biology , 12(1):87 – 107, 2013. doi: 10.1515/sagmb- 2012- 0069. A. Gelman, A. V ehtari, D. Simpson, C. C. Margossian, B. Carpenter , Y . Y ao, L. Kennedy , J. Gabry , P .-C. B ¨ urkner , and M. Modr ´ ak. Bayesian workﬂow . arXiv preprint , 2020. M. U. Gutmann and J. Corander . Bayesian optimization for likelihood-fr ee infer ence of simulator-based statis- tical models. Journal of Machine Learning Research , 2016. H. Haario, E. Saksman, and J. T amminen. Adaptive proposal distribution for random walk Metropolis algo- rithm. Computational Statistics , 14:375 – 395, 1999. A. Jasra, S. Jo, D. Nott, C. Shoemaker , and R. T empone. Multilevel monte carlo in approximate bayesian computation. Stochastic Analysis and Applications , 37(3):346–360, 2019. doi: 10.1080/07362994.2019.1566006. A. Lavin, H. Zenil, B. Paige, D. Krakauer , J. Gottschlich, T . Mattson, A. Anandkumar , S. Choudry , K. Rocki, A. G. Baydin, C. Prunkl, O. Isayev , E. Peterson, P . L. McMahon, J. H. Macke, K. Cranmer , J. Zhang, H. M. W ainwright, A. Hanuka, M. V eloso, S. Assefa, S. Zheng, and A. Pfeffer . Simulation intelligence: T owards a new generation of scientiﬁc methods. CoRR , abs/2112.03235, 2021. URL 03235 . J. Lintusaari, M. U. Gutmann, R. Dutta, S. Kaski, and J. Corander . Fundamentals and Recent Developments in Approximate Bayesian Computation. Systematic Biology , 66:66–82, 2017. doi: 10.1093/sysbio/syw077. J.-M. Lueckmann, G. Bassetto, T . Karaletsos, and J. H. Macke. Likelihood-fr ee inference with emulator net- works. arXiv preprint , 2018. G. M. Martin, D. T . Frazier , and C. P . Robert. Approximating bayes in the 21st century . Statistical Science , 2021. P . D. Moral, A. Doucet, and A. Jasra. An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and Computing , 22(5):1009–1020, 2012. 15 E. Numminen, L. Cheng, M. Gyllenberg, and J. Corander . Estimating the transmission dynamics of strepto- coccus pneumoniae from strain pr evalence data. Biometrics , 69(3):748–757, 2013. doi: 10.1111/biom.12040. J. Owen, D. J. W ilkinson, and C. S. Gillespie. Likelihood free inference for markov processes: a comparison. Statistical Applications in Genetics and Molecular Biology , 14(2):189–209, 2015. doi: 10.1515/sagmb- 2014- 0072. G. Papamakarios, D. C. Sterratt, and I. Murray . Sequential neural likelihood: Fast likelihood-free inference with autoregr essive ﬂows. In Proceedings of the 22nd International Conference on Artiﬁcial Intelligence and Statistics (AIST A TS) 2019 , volume 89, pages 837–848. Society for Artiﬁcial Intelligence and Statistics, PMLR, 4 2019. H. Pesonen, U. Simola, A. K ¨ ohn-Luque, H. V uollekoski, X. Lai, A. Frigessi, S. Kaski, D. T . Frazier , W . Ma- neesoonthorn, G. M. Martin, and J. Corande. ABC of the Future. International Statistical Review , 2022. L. F . Price, C. C. Drovandi, A. Lee, and D. J. Nott. Bayesian synthetic likelihood. Journal of Computational and Graphical Statistics , 27(1):1–11, 2018. J. K. Pritchar d, M. T . Seielstad, A. Perez-Lezaun, and M. W . Feldman. Population gr owth of human Y chromo- somes: a study of Y chromosome micr osatellites. Molecular biology and evolution , 16(12):1791–1798, 1999. D. Silk, S. Filippi, and M. Stumpf. Optimizing threshold-schedules for sequential approximate bayesian com- putation: applications to molecular systems. Statistical Applications in Genetics and Molecular Biology , 5(12): 603–618, 2013. U. Simola, J. Cisewski-Kehe, M. U. Gutmann, and J. Corander . Adaptive approximate bayesian computation tolerance selection. Bayesian Analysis , 2020. S. Sisson, Y . Fan, and M. Beaumont. Handbook of Approximate Bayesian Computation . Chapman & Hall/CRC Handbooks of Modern Statistical Methods. CRC Press, T aylor & Francis Group, 2018. ISBN 9781439881507. S. A. Sisson, Y . Fan, and M. T anaka. Sequential Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA , 104(6):1760–1765, 2007. M. M. T anaka, A. R. Francis, F . Luciani, and S. A. Sisson. Using appr oximate bayesian computation to estimate tuberculosis transmission parameters from genotype data. Genetics , 104(3):1511–1520, 2006. doi: 10.1534/ genetics.106.055574. T . T oni, D. W elch, N. Strelkowa, A. Ipsen, and M. Stumpf. Appr oximate bayesian computation scheme for parameter inference and model selection in dynamical systems. Journal of the Royal Society Interface , pages 6187–6202, 2009. D. F . V estrheim, E. A. Høiby , I. S. Aaberge, and D. A. Caugant. Phenotypic and genotypic characterization of streptococcus pneumoniae strains colonizing children attending day-car e centers in norway . J Clin Micr obiol. , 46(8):2508–2518, 2008. doi: 10.1128/JCM.02296- 07. S. W ood. Statistical inference for noisy nonlinear ecological dynamic systems. Nature , 466:1102–1107, 2010. 16

Stratified distance space improves the efficiency of sequential samplers for approximate Bayesian computation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment