Semi-parametric Bayesian inference under Neyman orthogonality
The validity of two-step or plug-in inference methods is questioned in the Bayesian framework. We study semi-parametric models where the plug-in of a non-parametrically modelled nuisance component is used. We show that when the nuisance and targeted …
Authors: Magid Sabbagh, David A. Stephens
Semi-p arametric B a yesian inference under Neyman or thogon ality Magid Sabbagh ∗ Department of Mathematics and Statistics McGill U niversity Montreal, QC, Canada magid.sabbagh@mail.mcgill.ca Da vid A. Stephens Department of Mathematics and Statistics McGill U niversity Montreal, QC, Canada david.stephens@mcgill.ca February 25, 2026 Abstra ct The validity of tw o-step or plug-in inf erence methods is questioned in the Ba yesian frame w ork. W e s tudy semi-parametric models where the plug-in of a non-parametr ically modelled nuisance component is used. W e show that when the nuisance and targ eted parameters satisfy a Ne yman orthogonal score proper ty , the approach of cutting f eedback through a tw o-step procedure is a v alid wa y of conducting Bay esian inf erence. Our method relies on a non-parametr ic Bay esian formulation based on the Dirichlet process and the Bay esian bootstrap. W e show that the marginal posterior of the targ eted parameter e xhibits good frequentist proper ties despite not accounting for the inf erential uncertainty of the nuisance parameter . W e adopt this approach in Ba yesian causal inf erence problems where the nuisance propensity score model is estimated to obtain marginal inference f or the treatment effect parameter, and demonstrate that a plug-in of the propensity score has a negligible effect on marginal posterior inference for the causal contrast. W e inv estigate the absence of Ne yman orthogonality and exploit our findings to sho w that in conv entional tw o-step procedures, the posterior distribution con ver ges under weak er restrictions than those needed in the frequentist sequel. For a simple famil y of useful scores, w e demonstrate that ev en in the absence of Ne yman or thogonality , the poster ior distr ibution is asymptotically unchanged by the estimation of the nuisance parameter, merely pro vided the latter estimator is consistent. Keyw ords Ba yesian bootstrap; Bay esian causal inf erence; tw o-step estimation; Ne yman or thogonality . 1 Introduction Suppose that the true value 𝜃 0 ∈ Θ ⊂ R 𝑝 of a parameter 𝜃 in a statistical model minimizes the expected loss 𝜃 0 ≡ argmin 𝜃 𝐸 𝑃 𝑂 𝑙 ( 𝑂 ; 𝜃 ) . f or some loss function 𝑙 : O × Θ → [ 0 , ∞) , where measure 𝑃 𝑂 , with distribution 𝐹 𝑂 , is the data generating model. W e seek to per form Bay esian inference f or 𝜃 0 . If 𝑃 𝑂 is unkno wn but is assumed to belong to a set F , an y pr ior 𝑃 F on F induces a prior on 𝑃 Θ on Θ , b y consider ing, f or 𝐹 ∈ F , 𝜃 ( 𝐹 ) = argmin 𝜃 O 𝑙 ( 𝑜 ; 𝜃 ) 𝑑𝐹 ( 𝑜 ) and then defining 𝑃 Θ ( 𝜃 ∈ 𝐴 ) = 𝑃 F { 𝐹 : 𝜃 ( 𝐹 ) ∈ 𝐴 } . Ev en if 𝑃 F is a non-parametr ic pr ior , that is, no finite-dimensional parameter characterizes 𝐹 𝑂 precisely , it is still the case that the induced pr ior on 𝜃 0 is w ell-defined. Ho we ver , in general 𝜃 0 does not characterize 𝐹 𝑂 , so standard approaches to inf erence using a parametr ic likelihood cannot be used. Semi-parametric Bay esian inference is necessar y in situations of par tial specification where components of the model, such as moments, are parametrized, but the remainder of the data-generating mechanism is modeled non-parametr ically . ∗ Corresponding author Semi-parametric Bay esian inference under Ne yman or thogonality W e f ocus on applications in causal inference as our motivation. Semi-parametric models are common in causal inference, where the inf erence task is to infer effect on an interv ention (a treatment for example) on an outcome subject to conf ounding by other measured variables (Rubin, 1977, 1979; Pear l, 2009). In this setting, the variation in the outcome as a function of the treatment is specified only via a mean model whose f orm is deemed too comple x to represent standard approaches such as regression. The propensity score, a function that encapsulates the treatment assignment model, has pla yed a ke y role in adjusting f or conf ounding in observational studies (Rosenbaum & Rubin, 1983, 1984). In most cases, the treatment assignment model needs to be estimated from the obser v ed data, with the fitted values used to carr y out adjustment. The use of fitted values of the propensity score in the classical frequentist literature is common, but in the Bay esian sequel it is still controv ersial. For e xample, it is debated whether valid Bay esian inf erence using this plug-in is possible. Li et al. (2023) sur ve y methods in Bay esian causal inference and discusses thoroughl y the advantag es and dra wbacks of the methods generall y deploy ed, such as the cutting feedbac k (McCandless et al., 2009) and tw o-step approaches (Kaplan & Chen, 2012). The Linked Bay esian Bootstrap (Stephens et al., 2023) ov ercomes the difficulties f ound in the previous approaches. It relies on a Bay esian non-parametr ic formulation and puts pr iors on the distributions rather than finite dimens ional parameters. The non-parametric pr ior -to-posterior update pro vides a wa y of obtaining a sample of the posterior of the treatment effect, a notion that we will revisit in section 2. This procedure pro vides the cor rect wa y of handling the inf erential uncer tainty of the fitted propensity score, yielding good frequentist proper ties. Central to the arguments in this paper is the concept of Ne yman or thogonality . If 𝜃 0 and ℎ 0 are respectiv ely the targ et and nuisance parameters satisfying the moment restriction 𝐸 𝑃 𝑂 { 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) } = 0 , we say that the score 𝑚 is Ne yman orthogonal if 𝑑 𝑑 𝑡 𝐸 𝑃 𝑂 [ 𝑚 { 𝑂 ; 𝜃 0 , ℎ 0 + 𝑡 ( ℎ − ℎ 0 ) }] 𝑡 = 0 = 0 f or all possible ℎ . Neyman or thogonality , which we revisit and properl y define in section 3.2, has been the foundation of the seminal w ork of Ne we y (1994) and Chernozhuko v et al. (2018). Although these w orks differ in imposing nuisance parameter restrictions, both hav e e xploited Ne yman or thogonality in reducing the impact f or estimating the nuisance parameters in the marginal frequentist inf erence of the parameter of interest. In our motiv ating setting, many quantities are based on Neyman or thogonal scores; f or e xample the a verag e treatment effect, the av erage treatment effect on the treated, and the local av erage treatment effect parameters satisfy Ne yman or thogonal scores. Such scores are often linear when considered as functions of 𝜃 ∈ Θ . In the Bay esian setting, we show that under Ne yman or thogonality , one may av oid incor porating Ba yesian inf erence f or the nuisance parameters and work with frequentist plug-in estimates. R ecently , Yiu et al. (2025) used efficient influence functions to obtain posterior cor rections of one-step estimators based on the Bay esian bootstrap. In fact, these corrections induce Ne yman or thogonality; see Chernozhuko v et al. (2018) f or a relation betw een influence functions and Ne yman or thogonal scores. The approach we adopt is based on targ eting via an estimating equation, and on a Ba yesian/freq uentist duality statement. Furthermore, we capitalize on the Bay esian/frequentist duality to establish that con ver gence of posterior distribution still holds ev en when Ne yman or thogonality is not met, and w e validate the fact that the nuisance parameter , pro vided it is estimated consistentl y , does not asymptotically impact the posterior distribution, ir respectiv e of its conv erg ence rate. 2 T arg eting Parameters of Interest 2.1 Notation W e adopt the f ollo wing notation: 𝑂 1 , . . . , 𝑂 𝑛 are i.i.d random variables defined on a probability space ( O , 𝐵 ) with distribution 𝑃 𝑂 . Define the empir ical process 𝑃 𝑛 and bootstrapped empir ical process 𝑃 𝑤 𝑛 b y 𝑃 𝑛 = 1 𝑛 𝑛 𝑖 = 1 𝛿 𝑂 𝑖 and 𝑃 𝑤 𝑛 = 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝛿 𝑂 𝑖 respectiv ely , where ( 𝑤 1 𝑛 , . . . , 𝑤 𝑛𝑛 ) are random weights. When needed and mainly f or notation conv enience, w e define 𝑃 : 𝐿 1 ( O , B , 𝑃 𝑂 ) → 𝑅 , by 𝑃 𝑓 : = 𝐸 𝑃 𝑂 { 𝑓 ( 𝑂 ) } = 𝑂 𝑓 ( 𝑜 ) 𝑑 𝑃 𝑂 ( 𝑜 ) . In the rest of the paper , weights drawn from a distribution 𝑃 𝑊 are assumed to be independent of the data. The joint measure is hence 𝑃 𝑂𝑊 = 𝑃 ∞ 𝑂 × 𝑃 𝑊 . W e wr ite 𝑃 𝑂 instead of 𝑃 ∞ 𝑂 f or simplicity . The relevant probability spaces are properl y defined in the Supplementar y Material. The theor y dev eloped in section 3.4 can be e xtended to other families 2 Semi-parametric Bay esian inference under Ne yman or thogonality of w eights (Praestgaard & W ellner, 1993), including Efron ’ s bootstrap (Efron, 1979) and the double bootstrap. In fact, 𝑃 𝑊 can be the distribution of a vector W 𝑛 = ( 𝑤 1 𝑛 , . . . , 𝑤 𝑛𝑛 ) with e x changeable non-negativ e components summing to 1 with the f ollo wing proper ties: 1. lim 𝑥 →∞ lim sup 𝑛 →∞ sup 𝑥 ≥ 𝜆 𝑥 2 𝑃 ( 𝑛 𝑤 1 𝑛 > 𝑥 ) = 0 , 2. lim sup 𝑛 →∞ 𝑛 ∥ 𝑤 1 𝑛 ∥ 2 , 1 ≤ 𝐶 for some 𝐶 > 0 , where ∥ 𝑤 1 𝑛 ∥ 2 , 1 = 1 0 𝑃 ( 𝑤 1 𝑛 > 𝑡 ) 𝑑 𝑡 3. 𝑛 − 1 𝑛 𝑖 = 1 ( 𝑛 𝑤 𝑖 𝑛 − 1 ) 2 = 𝑐 2 + 𝑜 𝑃 𝑊 ( 1 ) f or some 𝑐 > 0 In this paper , w e suppose W 𝑛 ∼ Dirichlet ( 1 , . . . , 1 ) (so that 𝑐 = 1 ) due to the Bay esian inter pretability of the results. 2.2 P osterior Distribution of T argeted Parameters The approach to targ eting parameters that inv olv es expressing them as minimizers of loss functions hav e been considered as partial specifications by Bissir i et al. (2016) in the absence of likelihoods. L yddon et al. (2019), who generalized the work of Ne wton & Raftery (1994), obtain a sample from the posterior distribution of 𝜃 ( 𝑃 𝑂 ) using a Bay esian non-parametric computational approach. Such strategy relies on the Dir ichlet process (Ferguson, 1973), acting as a prior on distr ibution functions. Suppose that 𝛼 is a finite Borel measure on ( O , 𝐵 ) . If a Dirichlet process pr ior DP ( 𝛼 ) is placed on 𝐹 , then the posterior distribution of 𝐹 | 𝑂 1 , . . . , 𝑂 𝑛 will f ollow a Dir ichlet process DP ( 𝛼 + 𝑛 𝑃 𝑛 ) . As 𝛼 ( O ) → 0 , the posterior distribution can represented b y the empir ical process 𝑃 𝑤 𝑛 with ( 𝑤 1 𝑛 , . . . , 𝑤 𝑛𝑛 ) ∼ Dir ( 1 , . . . , 1 ) . A sample from the posterior of the targ et parameter can hence be obtained by solving argmin 𝜃 O 𝑙 ( 𝑂 ; 𝜃 ) 𝑑 𝑃 𝑤 𝑛 = argmin 𝜃 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝑙 ( 𝑂 𝑖 ; 𝜃 ) . A similar argument can be constructed if 𝜃 0 ≡ 𝜃 0 ( 𝑃 𝑂 ) is the solution to a moment restr iction 𝐸 𝑃 𝑂 𝑚 ( 𝑂 ; 𝜃 ) = 0 . A posterior f or the targ eted parameter can be obtained by solving O 𝑚 ( 𝑂 ; 𝜃 ) 𝑑𝑃 𝑤 𝑛 ≡ 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝑚 ( 𝑂 𝑖 ; 𝜃 ) = 0 . These computational strategies hinge on the fact that the targ eted parameter satisfies a moment restr iction or minimizes a loss function. Such par tial specifications pro vide a method to bypass the kno wledg e of the full statistical model or data-generating process. P osterior distributions computed via the Bay esian bootstrap enjo y good proper ties. The f ollo wing theorem (K osorok, 2008) pro vides the Ba yesian/frequentis t duality ensuring the good properties of the posterior computed through the Bay esian bootstrap. Theorem 1. Let ˆ 𝜃 𝑛 and ˆ 𝜃 𝑛, 𝐵 𝐵 satisfy 𝑛 𝑖 = 1 𝑚 ( 𝑂 𝑖 ; ˆ 𝜃 𝑛 ) = 0 and 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝑚 ( 𝑂 𝑖 ; ˆ 𝜃 𝑛, 𝐵 𝐵 ) = 0 r espectively . Then, under regularity conditions, √ 𝑛 ( ˆ 𝜃 𝑛 − 𝜃 0 ) and √ 𝑛 ( ˆ 𝜃 𝑛, 𝐵 𝐵 − ˆ 𝜃 𝑛 ) | 𝑂 1 , . . . 𝑂 𝑛 conv erg e in distribution as 𝑛 → ∞ to a random v ariable with a N ( 0 , 𝐿 ) distribution, wher e 𝐿 = E 𝑃 𝑂 𝜕 𝑚 ( 𝑂 ; 𝜃 0 ) 𝜕 𝜃 ⊤ − 1 E 𝑃 𝑂 𝑚 ( 𝑂 ; 𝜃 0 ) 𝑚 ( 𝑂 ; 𝜃 0 ) ⊤ E 𝑃 𝑂 𝜕 𝑚 ( 𝑂 ; 𝜃 0 ) 𝜕 𝜃 ⊤ − ⊤ The conditional asymptotic distr ibution of √ 𝑛 ˆ 𝜃 𝑛, 𝐵 𝐵 − 𝜃 𝑛 | 𝑂 1 , . . . , 𝑂 𝑛 from Theorem 1 is the same as the limiting distribution of √ 𝑛 ( ˆ 𝜃 𝑛 − 𝜃 0 ) . How ev er , the randomness in the unconditional statement comes through the random variables 𝑂 1 . . . , 𝑂 𝑛 , while in the conditional one, the observations 𝑂 1 , . . . , 𝑂 𝑛 are fix ed, and the only source of randomness is the Dir ichlet w eights. When the duality between the frequentist statement and its Ba yesian counter par t holds, Ba yesian credible regions hav e asymptotically the cor rect co v erage probability and coincide with the frequentist confidence intervals based on ˆ 𝜃 𝑛 . 3 Semi-parametric Bay esian inference under Ne yman or thogonality 2.3 Nuisance Parameters In the presence of a nuisance parameter , the abo ve f or mulations may lack clar ity . In the e xample of section 1, it is unclear how to incor porate the propensity score in order to inf er about the treatment effect. Stephens et al. (2023) propose a solution to this problem when the nuisance parameter is itself the minimizer of a loss function or a solution to a parametrically specified estimating equation. If the propensity score is parametr ically modeled, then inference f or the nuisance propensity score model can proceed using standard approaches, for ex ample using an estimating equation using the function 𝑢 ( 𝑜 ; ℎ ) in addition to the principal estimating function 𝑚 ( 𝑜 ; 𝜃 , ℎ ) where 𝐸 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) = 0 . In this case Sabbagh & Stephens (2026) establishes the principal results concer ning Bay esian inference, namely (i) there e xists a Bay esian/frequentist duality that per mits approximate Bay esian inference via a study of the frequentist sequel. This duality deliv ers good frequentis t proper ties including asymptotic normality and co v erage at the nominal lev el. (ii) a plug-in method, that relies on simply using an estimate of the propensity model f or adjustment, produces an appro ximately Normal posterior that e xhibits consistent reco v er y of the targ et parameter . The plug-in method must be consistent (at the usual parametr ic rate) for ℎ 0 . Ho we ver , this is not the optimal Bay esian solution, ev en if the plug-in is the tr ue value ℎ 0 . (iii) a Bay esian approach that cor rectly propagates uncertainty in the propensity model produces a posterior distribution with low er posterior variance than the poster ior computed with a plug-in estimate. Specifically , the Linked Bay esian Bootstrap that uses a single set of w eights and yields the simultaneous sys tem O 𝑚 ( 𝑂 ; 𝜃 , ℎ ) 𝑢 ( 𝑂 ; ℎ ) 𝑑 𝑃 𝑤 𝑛 = 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝑚 ( 𝑂 𝑖 ; 𝜃 , ℎ ) 𝑢 ( 𝑂 𝑖 ; ℎ ) = 0 0 pro vides the optimal approach based on 𝑚 . (iv) in this case, 𝜃 and ℎ are a posteriori asymptoticall y independent. These results apply in the purely parametr ic setting, and in the semi-parametr ic setting in which the nuisance parameter is finite-dimensional. 3 Non-parame tric Assumptions 3.1 Motiv ation A ques tion that remains is how to handle the case when no parametr ic assumptions are made to estimate the nuisance parameter , and the only viable approach is through non-parametric methods. In this case, w e mus t assess ho w to propagate uncer tainty in estimating the nuisance parameter , ℎ 0 sa y , into the Ba yesian inference step f or the parameter of interest, and how to ensure good frequentist proper ties. Suppose that the tr ue values 𝜃 0 ∈ Θ of a targ eted parameter 𝜃 and ℎ 0 ∈ H of the nuisance parameter ℎ satisfy the moment condition 𝐸 𝑃 𝑂 { 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) } , where 𝑚 : O × Θ × H ↦→ 𝑅 𝑝 . If ℎ 0 w ere known, a posterior f or 𝜃 0 can be obtained b y solving 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝑚 ( 𝑂 𝑖 , 𝜃 , ℎ 0 ) = 0 . If ℎ 0 is unkno wn, a posterior f or 𝜃 0 , conditional on an estimator ˆ ℎ of ℎ 0 can be obtained b y solving 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝑚 ( 𝑂 𝑖 , 𝜃 , ˆ ℎ ) = 0 . As seen in section 2.3, when a parametric model arising from an estimating equation is av ailable f or ℎ 0 , a simple plug-in without propagating the uncer tainty is in general not the optimal wa y to perform Bay esian analy sis. Ho w ev er, when no parametric model is av ailable and fle xible methods must be used, it is non-trivial to come up with an analogue to the Linked Bay esian Bootstrap. In the absence of a parametr ic likelihood or estimating equations defining the nuisance parameter , accounting f or uncer tainty through the Linked Bay esian Bootstrap becomes considerably more difficult. W e e xplore other questions that can motivate our proposed solution to this problem: 1. What if our model does not propagate uncer tainty but at the same time is robust to biased but consistent estimation of ℎ 0 ? 4 Semi-parametric Bay esian inference under Ne yman or thogonality 2. In that case, do w e get good frequentist proper ties, and the Bay esian/frequentist duality? W e establish some of the ke y conditions in the ne xt section. 3.2 Ne yman orthogonality In the remaining parts of the paper, we suppose that the targ eted parameter space Θ is a subset of 𝑅 𝑝 and that the nuisance space H is a conv e x subset of a normed space N .For a v ector 𝑣 ∈ 𝑅 𝑝 , w e wr ite ∥ 𝑣 ∥ 𝑝 , 2 f or the Euclidean nor m of 𝑣 and f or ℎ ∈ H , w e wr ite ∥ ℎ ∥ H f or the nor m of ℎ . If ℎ, ℎ ′ ∈ H , we use for the sake of notational conv enience ∥ ℎ − ℎ ′ ∥ H f or the nor m of ℎ − ℎ ′ in the ambient nor med v ector space N . The tr ue values 𝜃 0 ∈ Θ of 𝜃 and ℎ 0 ∈ H of ℎ satisfy the moment restriction 𝐸 𝑃 𝑂 { 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) } , f or some score function 𝑚 : O × Θ × H → 𝑅 𝑝 . Definition 1. W e say t hat the score 𝑚 is orthogonal with respect to ℎ if 𝑓 ′ ℎ − ℎ 0 ( 0 ) = 0 , wher e 𝑓 ℎ − ℎ 0 ( 𝑡 ) = 𝐸 𝑃 𝑂 [ 𝑚 { 𝑂 ; 𝜃 0 , ℎ 0 + 𝑡 ( ℎ − ℎ 0 ) }] f or all ℎ ∈ H , and 𝑓 ′ denot es the ordinary first derivativ e of 𝑓 with respect to its argument. In order to simplify notation, the subscript ℎ − ℎ 0 will no longer be written. 3.3 Problem formulation and assumptions Let ˆ ℎ be an estimator of ℎ 0 and denote b y ˆ 𝜃 𝑛 and ˆ 𝜃 𝑛, 𝐵 𝐵 the respectiv e solutions to 𝑛 𝑖 = 1 𝑚 ( 𝑂 𝑖 , ˆ 𝜃 𝑛 , ˆ ℎ ) = 0 and 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝑚 ( 𝑂 𝑖 , ˆ 𝜃 𝑛, 𝐵 𝐵 , ˆ ℎ ) = 0 . Our goal is to establish the asymptotic proper ties of √ 𝑛 ˆ 𝜃 𝑛 − 𝜃 0 and √ 𝑛 ˆ 𝜃 𝑛, 𝐵 𝐵 − ˆ 𝜃 𝑛 | 𝑂 1 . . . 𝑂 𝑛 , and sho w that both limiting distributions are Normal with zero mean and variance Σ , where Σ = 𝑀 − 1 𝜃 0 𝐸 𝑃 𝑂 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) ⊤ 𝑀 − ⊤ 𝜃 0 , and 𝑀 𝜃 0 = 𝐸 𝑃 0 𝜕 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) 𝜕 𝜃 is an inv er tible 𝑝 × 𝑝 matrix. The estimator ˆ ℎ could be an estimator of Ba yesian nature, arising from non-parametr ic methods such as B AR T (see, f or e xample Hahn et al., 2020). W e underline that ˆ 𝜃 𝑛 is introduced ex clusivel y as a tool that facilitates establishing the Ba yesian/freq uentist duality and constructing Bay esian credible inter vals. When this duality does not hold, we emphasize the role play ed by ˆ 𝜃 𝑛 in sections 5.1 and 5.2. W e make the f ollo wing assumptions: Assumption 1. ℎ 0 is consistently estimated by ˆ ℎ ∈ 𝐻 𝑛 , wher e { 𝐻 𝑖 } ∞ 𝑖 = 1 ar e shrinking neighborhoods of ℎ 0 . Assumption 2. ˆ 𝜃 𝑛, 𝐵 𝐵 and ˆ 𝜃 𝑛 ar e unconditionally consistent estimators of 𝜃 0 . Sufficient conditions so that Assumption 2 holds are given in New ey (1994). Ho we v er , if the score 𝑚 is linear in 𝜃 , Assumption 2 is generall y satisfied. Assumption 3. The scor e 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) is a Neyman or thogonal score, satisfying Definition 1. Assumption 4. The class 𝜕 𝑚 ( 𝑂 ; 𝜃 , ℎ ) 𝜕 𝜃 ⊤ , ∥ 𝜃 − 𝜃 0 ∥ 𝑝 , 2 < 𝛿 1 , ∥ ℎ − ℎ 0 ∥ H < 𝛿 2 is 𝑃 𝑂 -Gliv enko-Cantelli f or some 𝛿 1 , 𝛿 2 > 0 , and the function ( 𝜃 , ℎ ) → 𝐸 𝑃 𝑂 𝜕 𝑚 ( 𝑂 ; 𝜃 , ℎ ) 𝜕 𝜃 ⊤ is continuous. Assumption 5. The class G = { 𝑚 ( 𝑂 ; 𝜃 0 , ℎ ) , ∥ ℎ − ℎ 0 ∥ H < 𝛿 } is 𝑃 𝑂 -Donsker for some 𝛿 > 0 , and 𝐸 𝑃 𝑂 ∥ 𝑚 ( 𝑂 ; 𝜃 0 , ℎ ) − 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) ∥ 2 𝑝 , 2 → 0 as ∥ ℎ − ℎ 0 ∥ H → 0 5 Semi-parametric Bay esian inference under Ne yman or thogonality Assumption 5 is in general sufficient to ensure that the terms studied in Lemma 3 conv erg e unconditionally to zero. While the Donsker assumptions ma y be limiting, recent results can ensure that both ter ms studied in Lemma 3 con ver ge to 0 as 𝑛 → ∞ ; see Yiu et al. (2025). Assumption 6. √ 𝑛 𝐸 𝑃 𝑂 𝑚 ( 𝑂 ; 𝜃 0 , ˆ ℎ ) → 0 in probability as 𝑛 → ∞ . Ne yman or thogonality of the score 𝑚 is an impor tant condition in ensur ing that Assumption 6 is satisfied. In section 5.1, w e study the effect of the absence of Ne yman or thogonality on the marginal posterior of the targ eted parameter . The f ollo wing lemma, whose proof can be f ound in the Supplementary Material, giv es sufficient conditions for Assumption 6 to hold. Lemma 1. Let 𝑓 ( 𝑡 ) = 𝐸 𝑃 𝑂 [ 𝑚 { 𝑂 ; 𝜃 0 , ℎ 0 + 𝑡 ( ℎ − ℎ 0 ) }] . If 1 0 ∥ 𝑓 ′′ ( 𝑡 ) ∥ 𝑝 , 2 𝑑 𝑡 = 𝑜 ( 𝑛 − 1 / 2 ) unif or mly in ℎ ∈ H , then √ 𝑛 𝐸 𝑃 𝑂 { 𝑚 ( 𝑂 ; 𝜃 0 , ℎ ) } → 0 unif or mly in ℎ ∈ H as 𝑛 → ∞ . The second derivativ e 𝑓 ′′ of 𝑓 depends on the rates at which the nuisance parameters need to be estimated. W e require in general the reasonable estimation rate ∥ ˆ ℎ − ℎ 0 ∥ H = 𝑜 𝑃 𝑂 ( 𝑛 − 1 / 4 ) . In man y ex amples, the nuisance parameter ℎ consists of tw o distinct components ℎ 1 and ℎ 2 with respectiv e tr ue values ℎ 10 and ℎ 20 , lying respectiv ely in conv ex subsets H 1 and H 2 of tw o normed spaces 𝑁 1 and 𝑁 2 . The nuisance set is hence H = H 1 × H 2 , a conv e x subset of 𝑁 1 × 𝑁 2 . W e can then define ∥ ℎ ∥ H = ∥ ( ℎ 1 , ℎ 2 ) ∥ H : = max ( ∥ ℎ 1 ∥ H 1 , ∥ ℎ 2 ∥ H 2 ) . Req uir ing that ℎ 10 and ℎ 20 are both estimated at an 𝑜 ( 𝑛 − 1 / 4 ) rate suffices to ensure that ∥ ˆ ℎ − ℎ 0 ∥ H = 𝑜 𝑃 𝑂 ( 𝑛 − 1 / 4 ) . There are more refined conditions that can be imposed on a case b y case basis. 3.4 Asympto tic Properties W e establish answers to the questions posed at the star t of section 3 via a succession of lemmas. It is impor tant to outline the derivation in order to highlight the role of each of the abo ve assumptions. W e hav e that 0 = 1 √ 𝑛 𝑛 𝑖 = 1 𝑚 ( 𝑂 𝑖 ; ˆ 𝜃 𝑛 , ˆ ℎ ) = 1 √ 𝑛 𝑛 𝑖 = 1 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ˆ ℎ ) + 1 𝑛 𝑛 𝑖 = 1 𝜕 𝑚 ( 𝑂 𝑖 ; 𝜃 𝑛 , ˆ ℎ ) 𝜕 𝜃 ⊤ √ 𝑛 ˆ 𝜃 𝑛 − 𝜃 0 and 0 = √ 𝑛 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝑚 ( 𝑂 𝑖 ; ˆ 𝜃 𝑛, 𝐵 𝐵 , ˆ ℎ ) = √ 𝑛 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝑚 𝑂 𝑖 ; 𝜃 0 , ˆ ℎ + 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝜕 𝑚 ( 𝑂 𝑖 ; ˜ 𝜃 𝑛 , ˆ ℎ ) 𝜕 𝜃 ⊤ √ 𝑛 ( ˆ 𝜃 𝑛, 𝐵 𝐵 − 𝜃 0 ) f or some 𝜃 𝑛 ∈ [ 𝜃 0 , ˆ 𝜃 𝑛 ] and ˜ 𝜃 𝑛 ∈ [ 𝜃 0 , ˆ 𝜃 𝑛, 𝐵 𝐵 ] , where [ 𝑎 , 𝑏 ] denotes the line the segment between 𝑎 and 𝑏 . Another e xpansion yields that √ 𝑛 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ˆ ℎ ) = √ 𝑛 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ℎ 0 ) + √ 𝑛 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 − 1 𝑛 𝑚 𝑂 𝑖 ; 𝜃 0 , ˆ ℎ − 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ℎ 0 ) + √ 𝑛 1 𝑛 𝑛 𝑖 = 1 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ˆ ℎ ) − 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ℎ 0 ) − 𝐸 𝑃 𝑂 𝑚 ( 𝑂 ; 𝜃 0 , ˆ ℎ ) − 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) + √ 𝑛 𝐸 𝑃 𝑂 𝑚 ( 𝑂 ; 𝜃 0 , ˆ ℎ ) W e obtain, b y setting the weights equal to 1 / 𝑛 , an expansion of 𝑛 − 1 / 2 𝑛 𝑖 = 1 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ˆ ℎ ) . 6 Semi-parametric Bay esian inference under Ne yman or thogonality Lemma 2. If Assumptions 1,2, and 4 are satisfied, then both 1 𝑛 𝑛 𝑖 = 1 𝜕 𝑚 ( 𝑂 𝑖 ; 𝜃 𝑛 , ˆ ℎ ) 𝜕 𝜃 ⊤ and 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝜕 𝑚 ( 𝑂 𝑖 ; ˜ 𝜃 𝑛 , ˆ ℎ ) 𝜕 𝜃 ⊤ conv erg e unconditionally to 𝑀 𝜃 0 in probability . Proof. This fact follo ws from the ex chang eable bootstrap f or Gliv enko-Cantelli classes. □ Lemma 3. If Assumptions 1 and 5 are satisfied, then 1. √ 𝑛 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 − 1 𝑛 𝑚 𝑂 𝑖 ; 𝜃 0 , ˆ ℎ − 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ℎ 0 ) 2. √ 𝑛 1 𝑛 𝑛 𝑖 = 1 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ˆ ℎ ) − 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ℎ 0 ) − 𝐸 𝑃 𝑂 𝑚 ( 𝑂 ; 𝜃 0 , ˆ ℎ ) − 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) conv erg e unconditionally to zero in probability as 𝑛 → ∞ . Proof. This f act f ollow s from the e xc hangeable bootstrap for Donsker classes, in par ticular from the fact that the processes √ 𝑛 ( 𝑃 𝑤 𝑛 − 𝑃 𝑛 ) and √ 𝑛 ( 𝑃 𝑛 − 𝑃 ) con ver ge unconditionall y in distribution to a tight Gaussian process in 𝐿 ∞ ( G ) . □ corollary 1. If Assumptions 1-6 ar e satisfied, then: : √ 𝑛 ( ˆ 𝜃 𝑛, 𝐵 𝐵 − 𝜃 0 ) = − 𝑀 − 1 𝜃 0 √ 𝑛 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ℎ 0 ) + 𝑜 𝑃 𝑂𝑊 ( 1 ) W e point out that the unconditional asymptotic distribution of √ 𝑛 ( ˆ 𝜃 𝑛, 𝐵 𝐵 − 𝜃 0 ) is Normal with zero mean and variance 2 Σ . Combining the previous lemmas, we obtain the follo wing theorem: Theorem 2. If Assumptions 1-6 ar e satisfied, then : √ 𝑛 ( ˆ 𝜃 𝑛 − 𝜃 0 ) = − 𝑀 − 1 𝜃 0 1 √ 𝑛 𝑛 𝑖 = 1 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ℎ 0 ) + 𝑜 𝑃 𝑂 ( 1 ) √ 𝑛 ( ˆ 𝜃 𝑛, 𝐵 𝐵 − ˆ 𝜃 𝑛 ) = − 𝑀 − 1 𝜃 0 √ 𝑛 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 − 1 𝑛 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ℎ 0 ) + 𝑜 𝑃 𝑂𝑊 ( 1 ) , Those r esults tog ether imply that √ 𝑛 ( ˆ 𝜃 𝑛 − 𝜃 0 ) and √ 𝑛 ( ˆ 𝜃 𝑛, 𝐵 𝐵 − ˆ 𝜃 𝑛 ) | 𝑂 1 , . . . , 𝑂 𝑛 conv erg e in distribution to r andom variables with N ( 0 , Σ ) distributions (almost surely). W e establish this duality when the score 𝑚 is not necessar ily differentiable with respect to 𝜃 in the Supplementar y Mater ial. In light of Theorem 2, we adopt the f ollowing computational strategy to sample from a posterior distribution of a the targ eted parameter 𝜃 0 , in the presence of a nuisance parameter ℎ 0 , satisfying 𝐸 𝑃 𝑂 { 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) } = 0 where 𝑚 is an orthogonal score. It is hence not necessar y to propagate the inferential uncer tainty attached to the estimation of the nuisance parameter Indeed, an estimator of ℎ 0 that is held fix ed dur ing the entire procedure ensures that the poster ior distribution possesses optimal frequentist properties. Moreov er, the conditional distribution √ 𝑛 ( ˆ 𝜃 𝑛, 𝐵 𝐵 − ˆ 𝜃 𝑛 ) | 𝑂 1 , . . . , 𝑂 𝑛 can be seen as an appro ximation to the sampling frequentist density of √ 𝑛 ( ˆ 𝜃 𝑛 − 𝜃 0 ) . Algorithm 1. Sampling from the P osterior Distribution Estimate ˆ ℎ (possibly non-parametrically) on 𝑂 1 , . . . , 𝑂 𝑛 . F or 𝑗 = 1 to 𝑗 = 𝐵 Draw random weights ( 𝑤 ( 𝑗 ) 1 𝑛 , . . . , 𝑤 ( 𝑗 ) 𝑛𝑛 ) ∼ Dir ( 1 , . . . , 1 ) F ind the solution 𝜃 ( 𝑗 ) to 𝑛 𝑖 = 1 𝑤 ( 𝑗 ) 𝑖 𝑛 𝑚 ( 𝑂 𝑖 ; 𝜃 , ˆ ℎ ) = 0 Output 𝜃 ( 1 ) , . . . , 𝜃 ( 𝐵 ) 7 Semi-parametric Bay esian inference under Ne yman or thogonality The distribution 𝑃 𝑊 of the weights can be set to be 𝑛 − 1 Multinomial ( 𝑛, 𝑛 − 1 , . . . , 𝑛 − 1 ) without distorting the asymptotic results obtained in Theorem 2. Hence the process 𝑃 𝑤 𝑛 coincides with Efron ’ s bootstrap. The estimate ˆ ℎ can be obtained prior to bootstrapping and not updated with each bootstrap iteration. Also, different estimators ˆ ℎ ( 1 ) , . . . , ˆ ℎ ( 𝐵 ) ma y be used, with ˆ ℎ ( 𝑗 ) used f or the 𝑗 -th bootstrap replicate without alter ing the result of Theorem 2, and this may yield better finite sample per f or mance. As an e xample, suppose that ˆ ℎ 1 and ˆ ℎ 2 are estimators of ℎ 0 . F or 𝑗 ∈ { 1 , 2 } , we consider the solutions to 𝑛 𝑖 = 1 𝑚 ( 𝑂 𝑖 , 𝜃 , ˆ ℎ 𝑗 ) = 0 . Applying Algorithm 1 successivel y f or ˆ ℎ 1 and ˆ ℎ 2 and collating the results yields a posterior sample of size 𝐵 1 + 𝐵 2 4 Simulations W e consider the model in subsection and implement Algor ithm 1. The data generating mechanism is as f ollo ws: 𝑋 ∈ 𝑅 𝑞 be distributed according to a multivariate Normal distribution with zero mean and co variance matrix Σ 𝑋 ∈ 𝑅 𝑞 × 𝑞 , whose ( 𝑖 , 𝑗 ) -th entr y equals 0 . 8 | 𝑖 − 𝑗 | / 4 . W e then generate the binar y treatment 𝑍 and outcome 𝑌 such that 𝑌 = 𝜃 0 𝑍 + 𝑔 0 ( 𝑋 ) + 𝑈 (1) 𝑍 | 𝑋 ∼ Ber n { 𝑒 0 ( 𝑋 ) } , (2) where 𝜃 0 = 3 , 𝑈 ∼ N ( 0 , 1 ) 𝑔 0 ( 𝑋 ) = 𝑋 1 + sin ( 𝑋 2 + 𝑋 3 ) + cos ( 𝑋 3 ) + | 𝑋 4 | + 𝑋 𝑞 𝑒 0 ( 𝑋 ) = 1 2 e xp 𝑞 𝑗 = 1 𝑋 𝑗 1 + e xp 𝑞 𝑗 = 1 𝑋 𝑗 + 1 5 In this section, w e repor t the results f or 𝑞 = 5 for three different sample sizes. W e estimate 𝑘 0 𝑦 ( 𝑋 ) = 𝐸 𝑃 𝑂 ( 𝑌 | 𝑋 ) and 𝑒 0 ( 𝑋 ) using random-f orests with sub-sampling without replacement with subsample size 𝑚 = 𝑛 0 . 49 . W e refer to Chen et al. (2022) f or a theoretical justification of the stochastic equicontinuity property satisfied by this procedure, as well as bounds on the mean-squared error rates achie v ed. Ev en though there are no theoretical guarantees for the conv egence of the bootstrapped term, we implment Algor ithm 1 f or this model. The score used in this study is the par tialled-out score 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) = 𝑌 − 𝑘 0 𝑦 ( 𝑋 ) − 𝜃 0 { 𝑍 − 𝑒 0 ( 𝑋 ) } { 𝑍 − 𝑒 0 ( 𝑋 ) } , where ℎ 0 = ( 𝑘 0 𝑦 , 𝑒 0 ) ; see R obinson (1988). W e compute 𝑓 ( 𝑡 ) = 𝐸 𝑃 𝑂 [ 𝑚 { 𝑂 ; 𝜃 0 , ℎ 0 + 𝑡 ( ℎ − ℎ 0 ) } ] in detail in the Supplementary Mater ial and show that 𝑓 ( 𝑡 ) = 𝑡 2 𝐸 𝑃 𝑂 𝑘 𝑦 ( 𝑋 ) − 𝑘 0 𝑦 ( 𝑋 ) { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) } − 𝜓 0 { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) } 2 and explain ho w it encapsulates the required rates of conv ergence of the nuisance parameter estimators to their re- spectiv e tr ue value. Similar calculations are found in Cher nozhuko v et al. (2018), ho we ver our method is slightl y different. W e r un a simulation across 1000 replicate analy ses f or different sample sizes 𝑛 . For each replicate, we implement Algorithm 1 and der ive the posterior distribution for the targeted parameter using 1000 Bay esian bootstrap samples. W e report the results in T able 1, which underscores the Ba yesian/freq uentist duality pro v ed in Theorem 2. A dditional numerical results where the nuisance parameters are fitted using conv entional non-parametr ic methods are also f ound in the Supplementary Mater ial. A dditional calculations and simulations concerning the AIPW estimator of the A TE are f ound in the Supplementary Mater ial. 5 Discussion 5.1 Ne yman Orthogonality and Cutting F eedback The Ne yman or thogonality condition has been inter preted in the frequentist literature as insensitivity to biased (but consistent) estimation of the nuisance parameter ℎ 0 . In light of the results presented abov e, w e giv e N eyman or thogonality 8 Semi-parametric Bay esian inference under Ne yman or thogonality T able 1: Characteristics of the frequentist and Bay esian distributions 𝑛 250 500 1000 A verag e of the Pos ter ior Means 3 . 01 3 . 02 3 . 00 Empirical Frequentist Mean 3 . 01 3 . 02 3 . 00 A verag e of Pos ter ior V ar iances ( × 𝑛 ) 5 . 36 5 . 09 4 . 98 Empirical Frequentist V ariance ( × 𝑛 ) 5 . 08 4 . 71 4 . 88 A verag e Sandwich Estimate 5 . 38 5 . 11 4 . 99 A verag e Bay esian credible inter val ( 2 . 73 , 3 . 30 ) ( 2 . 82 , 3 . 22 ) ( 2 . 87 , 3 . 15 ) Frequentist confidence interval ( 2 . 74 , 3 . 30 ) ( 2 . 83 , 3 . 21 ) ( 2 . 87 , 3 . 15 ) Pos terior Cov erage 94 . 80 95 . 30 94 . 30 a Ba yesian inter pretation as a guarantee of the robustness of the poster ior to biased but consistent estimation of ℎ 0 and the insensitivity of the posterior to uncer tainty propagation. That is, the posterior computed through the Ba yesian bootstrap satisfies a Ba yesian/freq uentist duality statement despite the nuisance parameter being estimated only once. Cutting f eedback using a conv entional plug-in approach is hence a valid wa y to per f or m Bay esian inference, provided a Ne yman or thogonal score is used. It is impor tant to reiterate that, in most parametric models, ev en in purely frequentis t settings, estimating a nuisance parameter will affect the variance of the parameter of interest. Many scores used in parametric statistics are not or thogonal with respect to the nuisance parameter. In the parametr ic case, estimating the posterior distribution of parameter of interest will generall y be affected by the use of a plug-in estimate of the nuisance parameter . The asymptotic Normality of this posterior , how ev er , is alwa ys guaranteed ev en if the posterior has a higher than desired variance and exhibits inadequate co verag e (Sabbagh & Stephens, 2026). When the nuisance parameter is estimated non-parametr icall y , asymptotic Normality of the posterior may not be secured. Ne v er theless, imposing Ne yman or thogonality of the score function coupled with reasonable requirements on the nuisance estimator will lead to an asymptotically Normal posterior distr ibution unaffected by the estimation of ℎ 0 . If the conclusion of Lemma 3 holds, but Ne yman or thogonality is not satisfied, and if instead √ 𝑛 𝐸 𝑃 𝑂 𝑚 ( 𝑂 ; 𝜃 0 , ˆ ℎ ) is assumed to be bounded in probability , w e obtain that √ 𝑛 ( ˆ 𝜃 𝑛 − 𝜃 0 ) = − 𝑀 − 1 𝜃 0 1 √ 𝑛 𝑛 𝑖 = 1 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ℎ 0 ) − 𝑀 − 1 𝜃 0 √ 𝑛 𝐸 𝑃 𝑂 𝑚 ( 𝑂 ; 𝜃 0 , ˆ ℎ ) + 𝑜 𝑃 𝑂 ( 1 ) , √ 𝑛 ( ˆ 𝜃 𝑛, 𝐵 𝐵 − 𝜃 0 ) = − 𝑀 − 1 𝜃 0 √ 𝑛 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ℎ 0 ) − 𝑀 − 1 𝜃 0 √ 𝑛 𝐸 𝑃 𝑂 𝑚 ( 𝑂 ; 𝜃 0 , ˆ ℎ ) + 𝑜 𝑃 𝑂𝑊 ( 1 ) . Theref ore, √ 𝑛 ( ˆ 𝜃 𝑛, 𝐵 𝐵 − ˆ 𝜃 𝑛 ) = − 𝑀 − 1 𝜃 0 √ 𝑛 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 − 1 𝑛 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ℎ 0 ) + 𝑜 𝑃 𝑂𝑊 ( 1 ) , which implies that, almost surel y , as 𝑛 → ∞ , √ 𝑛 ( ˆ 𝜃 𝑛, 𝐵 𝐵 − ˆ 𝜃 𝑛 ) | 𝑂 1 , . . . , 𝑂 𝑛 → N ( 0 , Σ ) . While this result is appealing at first, it transpires that the poster ior may manifes t inadequate cov erage and inflated variance. The Ba yesian/freq uentist duality may not hold. In fact, if √ 𝑛 𝐸 𝑃 𝑂 𝑚 ( 𝑂 ; 𝜃 0 , ˆ ℎ ) is regular and asymptotically linear with a non-zero influence function (Ne we y, 1994), the asymptotic frequentist v ar iance of √ 𝑛 ˆ 𝜃 𝑛 − 𝜃 0 ma y be no larg er than Σ . 5.2 Debiasing and Con verg ence of the P osterior The frequentist estimator ˆ 𝜃 𝑛 , the solution of 𝑛 𝑖 = 1 𝑚 ( 𝑂 𝑖 ; 𝜃 , ˆ ℎ ) = 0 , pro vides the cor rect wa y of debiasing the posterior regardless of whether the score is or thogonal. Such calculations were essential in establishing results of similar flav or f or two-s tep approac hes in Sabbagh & S tephens (2026) whereb y the asymptotic Normality of the posterior in the parametric setting is es tablished, albeit with cov erage and the Bay esian/frequentist duality are compromised. The f ollo wing ex ample underlines that the posterior under the Bay esian bootstrap conv erges under rather minor restrictions. This con ver gence is rev ealed pro vided the debiasing is car r ied out cor rectly . W e suppose that the score 𝑚 identifying a real-valued parameter 𝜃 has the form, 𝑚 ( 𝑂 ; 𝜃 , ℎ ) = 𝜃 − 𝐵 ( 𝑂 ; ℎ ) , 9 Semi-parametric Bay esian inference under Ne yman or thogonality f or some function 𝐵 : O × H → 𝑅 , where 𝐸 𝑃 𝑂 { 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) } = 0 . Let ˆ ℎ be an estimator of ℎ . The respective solutions to 𝑛 𝑖 = 1 𝑚 ( 𝑂 𝑖 ; 𝜃 , ˆ ℎ ) = 0 and 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝑚 ( 𝑂 𝑖 ; 𝜃 , ˆ ℎ ) = 0 are ˆ 𝜃 𝑛 = 1 𝑛 𝑛 𝑖 = 1 𝐵 ( 𝑂 𝑖 , ˆ ℎ ) and ˆ 𝜃 𝑛, 𝐵 𝐵 = 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝐵 ( 𝑂 𝑖 , ˆ ℎ ) . Then the f ollo wing theorem holds: Theorem 3. If Assumption 5 or any of its possible alternativ es holds, then almost sur ely, as 𝑛 → ∞ √ 𝑛 ˆ 𝜃 𝑛, 𝐵 𝐵 − ˆ 𝜃 𝑛 | 𝑂 1 , . . . 𝑂 𝑛 → N 0 , 𝐸 𝑃 𝑂 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) 2 . Proof. W e wr ite √ 𝑛 ˆ 𝜃 𝑛, 𝐵 𝐵 − ˆ 𝜃 𝑛 = √ 𝑛 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 − 1 𝑛 𝐵 ( 𝑂 𝑖 , ˆ ℎ ) = √ 𝑛 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 − 1 𝑛 𝐵 ( 𝑂 𝑖 , ˆ ℎ ) − 𝐵 ( 𝑂 𝑖 , ℎ 0 ) + √ 𝑛 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 − 1 𝑛 𝐵 ( 𝑂 𝑖 , ℎ 0 ) . The first term in the right hand side can be assumed to go to zero, under Donsker assumptions, sa y . Such results typically merely require ∥ ˆ ℎ − ℎ 0 ∥ H = 𝑜 𝑃 𝑂 ( 1 ) , rather than the fas ter rate of 𝑜 𝑃 𝑂 ( 𝑛 − 1 / 4 ) . W e then obtain that, almost surely , √ 𝑛 ˆ 𝜃 𝑛, 𝐵 𝐵 − ˆ 𝜃 𝑛 | 𝑂 1 , . . . 𝑂 𝑛 → N 0 , 𝐸 𝑃 𝑂 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) 2 , irrespective of whether the score 𝑚 is Ne yman or thogonal or not. □ The conditional mean of ˆ 𝜃 𝑛, 𝐵 𝐵 , which is seen to ˆ 𝜃 𝑛 , pro vides the most natural wa y to debias ˆ 𝜃 𝑛, 𝐵 𝐵 . It may be tempting to debias ˆ 𝜃 𝑛, 𝐵 𝐵 b y quantities that do not depend on ˆ ℎ such as 𝑛 − 1 𝑛 𝑖 = 1 𝐵 ( 𝑂 𝑖 , ℎ 0 ) . Such debiasing is not optimal, as it is not based on the conditional mean and does not allo w reco very of the results of Theorem 3 without imposing additional unnecessary assumptions. The asymptotic posterior variance corresponds to the asymptotic variance of the frequentist and Ba y esian estimators based on the equations 𝑛 𝑖 = 1 𝑚 ( 𝑂 𝑖 , 𝜃 , ℎ 0 ) = 0 and 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝑚 ( 𝑂 𝑖 , 𝜃 , ℎ 0 ) = 0 . These fundamental calculations rev eal that under weak restrictions, posterior distributions are asymptotically not influenced b y the nuisance parameter , ir respectiv e of the score used. R eco vering the Bay esian/frequentist duality and cov erage at the nominal le vel depends on ˆ 𝜃 𝑛 , the frequentis t estimator , and its proper ties dictated b y the Ne yman or thogonality of the score 𝑚 and the fast enough conv erg ence rate of ˆ ℎ to ℎ 0 . This highlights the fact that ensur ing an asymptotically normal posterior with the desired variance is not enough to ensure valid confidence intervals. The exis tence of a ’bootstrap consistency’ statement leading to ’a validity of the bootstrap’ conclusion, and the pair ing of a conditional quantity to a corresponding unconditional quantity , are required; see Cheng & Huang (2010). 5.3 Future W ork The Donsker assumptions attached to the nuisance space play ed a cr ucial role in establishing the proper ties of the frequentist and Ba yesian estimators of the targ eted parameter of interest. Indeed, the proper ties of the e xc hangeable bootstrap f or Donsker classes enables us to swiftly prov e Lemma 3. With the rapid expansion of machine lear ning algorithms, interest has expanded bey ond reliance on Donsker assumptions. It is well documented in Cher nozhuk ov et al. (2018) that Donsker classes of functions may fail to cor rectly model nuisance functions that depend on a larg e number of cov ariates. Approaches that allo w relaxation of the Donsker assumptions in order to incor porate a larg er class of fle xible methods as estimators for the nuisance parameters hav e been proposed in Sabbagh (2025). These approaches deplo y cross-fitting procedures that allow the Bay esian/frequentist duality to be maintained ev en if Donsker conditions do not apply , such as when the dimensionality of the predictor space is high. A ckno wledg ement The authors ackno wledg e the suppor t of the Natural Sciences and Engineer ing Researc h Council of Canada (NSER C) and the Institut des Sciences Mathématiques (ISM). 10 Semi-parametric Bay esian inference under Ne yman or thogonality Supplement ar y Ma terial The Supplementary Mater ial includes the assumptions on the relevant probability spaces, a proof of the main theorem without the differentiability of the score, calculations on the par tially linear and par tial interactive models and additional simulations concerning the AIPW estimator . S.1 Probability Spaces The f ollo wing assumptions specify the relev ant probability spaces required in the theoretical statements. W e assume 1. The data are realizations from the from probability space ( O , 𝐵 , 𝑃 𝑂 ) . 2. The Bay esian bootstrap w eights are defined on a probability space ( W , C , 𝑃 𝑊 ) . 3. Observation 𝑂 𝑖 is the 𝑖 -th coordinate of the canonical projection from ( O ∞ , 𝐵 ∞ , 𝑃 ∞ 𝑂 ) . 4. For the joint randomness, coming from the obser v ed data and from the weights, is defined on the product probability space ( O ∞ , 𝐵 ∞ , 𝑃 ∞ 𝑂 ) × ( W , C , 𝑃 𝑊 ) = ( O ∞ × W , 𝐵 ∞ × C , 𝑃 ∞ 𝑂 × 𝑃 𝑊 ) . The joint probability measure is hence 𝑃 𝑂𝑊 = 𝑃 ∞ 𝑂 × 𝑃 𝑊 . W e write 𝑃 𝑂 in lieu of 𝑃 ∞ 𝑂 f or simplicity when necessary . S.2 Proof of Lemma 1 Proof of Lemma 1. The proof consists of a second order expansion of 𝑓 around 0 and an application of a f or m of a mean value theorem. By Lag range ’s mean value theorem as in Sherber t & Bar tle (2020), w e hav e that 𝑓 ( 1 ) = 𝑓 ( 0 ) + 𝑓 ′ ( 0 ) + 1 2 1 0 ( 1 − 𝑠 ) 𝑓 ′′ ( 𝑠 ) 𝑑𝑠 , Note that 𝑓 ( 1 ) = 𝐸 𝑃 𝑂 { 𝑚 ( 𝑂 ; 𝜃 0 , ℎ ) } and 𝑓 ( 0 ) = 𝐸 𝑃 𝑂 { 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) } = 0 and 𝑓 ′ ( 0 ) = 0 b y Neyman or thogonality . Moreo v er, 1 0 ( 1 − 𝑥 ) 𝑓 ′′ ( 𝑥 ) 𝑑𝑥 𝑝 , 2 ≤ 1 0 ( 1 − 𝑥 ) ∥ 𝑓 ′′ ( 𝑥 ) ∥ 𝑝 , 2 𝑑𝑥 ≤ 1 0 ∥ 𝑓 ′′ ( 𝑥 ) ∥ 𝑝 , 2 𝑑𝑥 Theref ore, √ 𝑛 𝑓 ( 1 ) 𝑝 , 2 ≤ √ 𝑛 1 0 ∥ 𝑓 ′′ ( 𝑥 ) ∥ 𝑝 , 2 𝑑𝑥 → 0 . This sho ws that unif or mly in ℎ , we hav e √ 𝑛 𝐸 𝑃 𝑂 { 𝑚 ( 𝑂 ; 𝜃 0 , ℎ ) } → 0 , which in tur ns leads to the fact √ 𝑛 𝐸 𝑃 𝑂 𝑚 𝑂 ; 𝜃 0 , ˆ ℎ con ver ges to 0 in probability as 𝑛 → ∞ . □ S.3 Extension to non-differentiable scores W e use the same frame work as the one in section 3.4 of the main paper . W e suppose that 𝐸 𝑃 𝑂 { 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) } = 0 , where 𝑚 is a Ne yman or thogonal score with respect to ℎ . Let ˆ ℎ be an estimator of ℎ 0 and denote b y ˆ 𝜃 𝑛 and ˆ 𝜃 𝑛, 𝐵 𝐵 the respectiv e solutions to 𝑛 𝑖 = 1 𝑚 ( 𝑂 𝑖 , ˆ 𝜃 𝑛 , ˆ ℎ ) = 0 and 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝑚 ( 𝑂 𝑖 , ˆ 𝜃 𝑛, 𝐵 𝐵 , ˆ ℎ ) = 0 . 11 Semi-parametric Bay esian inference under Ne yman or thogonality Instead of assuming that 𝑚 is differentiable with respect to 𝜃 , w e assume that the function ( 𝜃 , ℎ ) ↦→ 𝐸 𝑃 𝑂 [ 𝑚 { 𝑂 ; 𝜃 , ℎ }] is differentiable with respect to 𝜃 and ℎ . Concretly , we assume that the function 𝑔 ( 𝑠 ) = 𝐸 𝑃 𝑂 [ 𝑚 { 𝑂 ; 𝜃 0 + 𝑠 ( 𝜃 − 𝜃 0 ) , ℎ 0 + 𝑠 ( ℎ − ℎ 0 ) }] is differentiable at 0 f or ev er y ℎ ∈ H and 𝜃 ∈ Θ . The targeted parameter space is a con ve x subset of 𝑅 𝑝 , and 𝜃 0 is in the interior of Θ . Similar constraints can be placed on the nuisance space H . The main idea is to be able to define the derivativ e of 𝑔 ( 𝑠 ) at 𝑠 = 0 . W e remark that 𝑔 ( 0 ) = 𝐸 𝑃 𝑂 { 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) } = 0 b y assumption and that 𝑔 ′ ( 0 ) = 𝑀 𝜃 0 ( 𝜃 − 𝜃 0 ) , b y Ne yman or thogonality of 𝑚 with respect to ℎ , where 𝑀 𝜃 0 = 𝜕 𝐸 𝑃 𝑂 { 𝑚 ( 𝑂 ; 𝜃 , ℎ 0 ) } 𝜕 𝜃 ⊤ 𝜃 = 𝜃 0 . By also assuming that 𝑀 𝜃 0 is an in vertible, and b y defining Σ = 𝑀 − 1 𝜃 0 𝐸 𝑃 𝑂 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) ⊤ 𝑀 − ⊤ 𝜃 0 w e show that √ 𝑛 ( ˆ 𝜃 𝑛 − 𝜃 0 ) and √ 𝑛 ( ˆ 𝜃 𝑛, 𝐵 𝐵 − ˆ 𝜃 𝑛 ) | 𝑂 1 , . . . , 𝑂 𝑛 con ver ge in distribution to random variables with N ( 0 , Σ ) distributions (almost surely). W e make the f ollowing assumptions : Assumption 7. ℎ 0 is consistently estimated by ˆ ℎ ∈ H 𝑛 , wher e { H 𝑖 } ∞ 𝑖 = 1 ar e shrinking neighborhoods of ℎ 0 . Assumption 8. ˆ 𝜃 𝑛, 𝐵 𝐵 and ˆ 𝜃 𝑛 ar e unconditionally consistent estimators of 𝜃 0 . Assumption 9. The class G = 𝑚 ( 𝑂 ; 𝜃 , ℎ ) , ∥ 𝜃 − 𝜃 0 ∥ 𝑝 , 2 < 𝛿 1 , ∥ ℎ − ℎ 0 ∥ H < 𝛿 2 is 𝑃 𝑂 -Donsker f or some 𝛿 1 , 𝛿 2 > 0 , and 𝐸 𝑃 𝑂 ∥ 𝑚 ( 𝑂 ; 𝜃 , ℎ ) − 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) ∥ 2 𝑝 , 2 → 0 as ∥ ℎ − ℎ 0 ∥ H → 0 and ∥ 𝜃 − 𝜃 0 ∥ 𝑃 , 2 → 0 . Assumption 10. The function 𝑔 ( 𝑠 ) = 𝐸 𝑃 𝑂 [ 𝑚 { 𝑂 ; 𝜃 0 + 𝑠 ( 𝜃 − 𝜃 0 ) , ℎ 0 + 𝑠 ( ℎ − ℎ 0 ) }] satisfies √ 𝑛 1 0 ∥ 𝑔 ′′ ( 𝑠 ) ∥ 𝑝 , 2 𝑑𝑠 → 0 as 𝑛 → ∞ uniformly in 𝜃 and ℎ . Hence if Assumption 10 holds, and proceeding as in the proof of Lemma 1, w e obtain that √ 𝑛 𝐸 𝑃 𝑂 { 𝑚 ( 𝑂 ; 𝜃 , ℎ ) } − 𝑀 𝜃 0 √ 𝑛 ( 𝜃 − 𝜃 0 ) con ver ges to 0 as 𝑛 → ∞ unif ormly in 𝜃 and ℎ . By considering the e xpansion, and using the proper ties of the e x changeable bootstrap for Donsker classes, we obtain 00 = √ 𝑛 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝑚 ( 𝑂 𝑖 ; ˆ 𝜃 𝑛, 𝐵 𝐵 , ˆ ℎ ) = √ 𝑛 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ℎ 0 ) + √ 𝑛 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 − 1 𝑛 𝑚 𝑂 𝑖 ; ˆ 𝜃 𝑛, 𝐵 𝐵 , ˆ ℎ − 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ℎ 0 ) + √ 𝑛 1 𝑛 𝑛 𝑖 = 1 𝑚 ( 𝑂 𝑖 ; ˆ 𝜃 𝑛, 𝐵 𝐵 , ˆ ℎ ) − 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ℎ 0 ) − 𝐸 𝑃 𝑂 𝑚 ( 𝑂 ; ˆ 𝜃 𝑛, 𝐵 𝐵 , ˆ ℎ ) − 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) + √ 𝑛 𝐸 𝑃 𝑂 𝑚 ( 𝑂 ; ˆ 𝜃 𝑛, 𝐵 𝐵 , ˆ ℎ ) = √ 𝑛 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ℎ 0 ) + √ 𝑛 𝐸 𝑃 𝑂 𝑚 ( 𝑂 ; ˆ 𝜃 𝑛, 𝐵 𝐵 , ˆ ℎ ) + 𝑜 𝑃 𝑂𝑊 ( 1 ) = √ 𝑛 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ℎ 0 ) + 𝑀 𝜃 0 √ 𝑛 ˆ 𝜃 𝑛, 𝐵 𝐵 − 𝜃 0 + 𝑜 𝑃 𝑂𝑊 ( 1 ) Theref ore, √ 𝑛 ˆ 𝜃 𝑛, 𝐵 𝐵 − 𝜃 0 = − 𝑀 − 1 𝜃 0 √ 𝑛 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ℎ 0 ) + 𝑜 𝑃 𝑂𝑊 ( 1 ) . 12 Semi-parametric Bay esian inference under Ne yman or thogonality In a similar manner , w e obtain that √ 𝑛 ( ˆ 𝜃 𝑛 − 𝜃 0 ) = − 𝑀 − 1 𝜃 0 1 √ 𝑛 𝑛 𝑖 = 1 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ℎ 0 ) + 𝑜 𝑃 𝑂 ( 1 ) which results in √ 𝑛 ( ˆ 𝜃 𝑛, 𝐵 𝐵 − ˆ 𝜃 𝑛 ) = − 𝑀 − 1 𝜃 0 √ 𝑛 𝑛 𝑖 = 1 𝑤 𝑖 𝑛 − 1 𝑛 𝑚 ( 𝑂 𝑖 ; 𝜃 0 , ℎ 0 ) + 𝑜 𝑃 𝑂𝑊 ( 1 ) , establishing the desired Bay esian/frequentist duality . S.4 A dditional Calculations In what follo ws, we furnish some ex amples of models and various parameters of interest that satisfy an or thogonal score and give practical insights on how to efficiently chec k the assumptions related to 𝑓 ( 𝑡 ) = 𝐸 𝑃 𝑂 [ 𝑚 { 𝑂 ; 𝜃 0 , ℎ 0 + 𝑡 ( ℎ − ℎ 0 ) }] . Chernozhuko v et al. (2018) is extensiv e in its examples sho wing these fundamental calculations, and some calculations sho wed belo w hav e been done therein. Ho w ev er, there are some tips w e provide that one can take into account that may render these calculations slightly simpler and more manageable. S.4.1 Partiall y linear model Consider the structural model 𝑌 = 𝜃 0 𝑍 + 𝑔 0 ( 𝑋 ) + 𝑈 , 𝐸 𝑃 𝑂 ( 𝑈 | 𝑋 , 𝑍 ) = 0 𝑍 | 𝑋 ∼ Ber n { 𝑒 0 ( 𝑋 ) } . W e first begin by looking at the par tialled-out score of R obinson (1988). 𝑚 ( 𝑂 , 𝜃 0 , ℎ 0 ) = 𝑌 − 𝑘 0 𝑦 ( 𝑋 ) − 𝜃 0 { 𝑍 − 𝑒 0 ( 𝑋 ) } { 𝑍 − 𝑒 0 ( 𝑋 ) } , where 𝑘 0 𝑦 ( 𝑥 ) = 𝐸 𝑃 𝑂 ( 𝑌 | 𝑋 = 𝑥 ) = 𝜃 0 𝑒 0 ( 𝑥 ) + 𝑔 0 ( 𝑥 ) . A useful approach while computing 𝑓 ( 𝑡 ) is to try to get a simplified e xpression to the larg est possible e xtent bef ore we embark on differentiating. In other words, we can tr y to simplify 𝑓 ( 𝑡 ) and keep only nuisance parameters (although it ma y not be necessar y in some cases) in its final expression. W e illustrate with the follo wing with the relativel y simple e xample of the par tialled-out score. The simplification provided by this method is amplified in the harder ex ample of subsection S.4.2. In this case, the tr ue parameter ℎ 0 = ( 𝑘 0 𝑦 , 𝑒 0 ) . Proposition 1. Let 𝑓 ( 𝑡 ) = 𝐸 𝑃 𝑂 [ 𝑚 { 𝑂 ; 𝜃 0 , ℎ 0 + 𝑡 ( ℎ − ℎ 0 ) } ] , wher e ℎ = ( 𝑘 𝑦 , 𝑒 ) . Then, 𝑓 ( 𝑡 ) = 𝑡 2 𝐸 𝑃 𝑂 𝑘 𝑦 ( 𝑋 ) − 𝑘 0 𝑦 ( 𝑋 ) { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) } − 𝜃 0 { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) } 2 In order to establish this equality , w e use the facts that 𝐸 𝑃 𝑂 [ 𝑉 | 𝑋 ] = 0 , where 𝑉 = 𝑍 − 𝑒 0 ( 𝑋 ) and 𝐸 𝑃 𝑂 [ 𝑈 | 𝑋 , 𝑍 ] = 0 . One can see that 𝑓 ( 𝑡 ) = 𝐸 𝑃 𝑂 𝑈 − 𝑡 𝑘 𝑦 ( 𝑋 ) − 𝑘 0 𝑦 ( 𝑋 ) + 𝜃 0 𝑡 { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) } [ 𝑉 − 𝑡 { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) }] , which simplifies to 𝑓 ( 𝑡 ) = 𝑡 2 𝐸 𝑃 𝑂 𝑘 𝑦 ( 𝑋 ) − 𝑘 0 𝑦 ( 𝑋 ) { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) } − 𝜃 0 { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) } 2 . It becomes quite clear that 𝑓 ( 0 ) = 0 and 𝑓 ′ ( 0 ) = 0 and 𝑓 ′′ ( 𝑡 ) = 𝑓 ′′ ( 0 ) , and hence 1 0 | 𝑓 ′′ ( 𝑡 ) | 𝑑 𝑡 ≤ 2 𝐸 𝑃 𝑂 𝑘 𝑦 ( 𝑋 ) − 𝑘 0 𝑦 ( 𝑋 ) 2 𝐸 𝑃 𝑂 { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) } 2 + 2 𝜃 0 𝐸 𝑃 𝑂 { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) } 2 , which hence implies that √ 𝑛 1 0 | 𝑓 ′′ ( 𝑡 ) | 𝑑 𝑡 → 0 , if the nuisance parameters are estimated at a root-mean-squared er ror rate 𝑜 ( 𝑛 − 1 / 4 ) . 13 Semi-parametric Bay esian inference under Ne yman or thogonality S.4.2 Partial interactiv e model W e now consider a generalized v ersion of the par tially linear model studied in R obinson (1988). In this model, interactions between 𝑋 (conf ounders) and 𝑍 (treatment) are allo wed, and the outcome model is not assumed to be separable into a function of 𝑋 and a scalar multiple of 𝑍 . Suppose the structural model is 𝑌 = 𝜇 0 ( 𝑍 , 𝑋 ) + 𝑈 , 𝐸 𝑃 𝑂 [ 𝑈 | 𝑋 , 𝑍 ] = 0 𝑍 | 𝑋 ∼ Ber n { 𝑒 0 ( 𝑋 ) } , The follo wing assumption, called positivity or no-ov erlap is prev alent in causal inference. It consists of bounding (almost surely) 𝜖 < 𝑃 ( 𝑍 = 1 | 𝑋 ) < 1 − 𝜖 f or some 0 < 𝜖 < 1 / 2 . In practice, it means that treatment assignment is not deterministic. In theor y , this proper ty permits the bounded behavior of terms of the f or m 1 / 𝑒 ( 𝑋 ) and 1 / { 1 − 𝑒 ( 𝑋 ) } . W e will assume that 𝑒 0 satisfies the positivity proper ty and the elements of the nuisance space satisfy it as well. The a verag e treatment effect 𝜃 0 in this model is equal to : 𝜃 0 = 𝐸 𝑃 𝑂 𝜇 0 ( 1 , 𝑋 ) − 𝜇 0 ( 0 , 𝑋 ) + 𝑍 { 𝑌 − 𝜇 0 ( 1 , 𝑋 ) } 𝑒 0 ( 𝑋 ) − ( 1 − 𝑍 ) { 𝑌 − 𝜇 0 ( 0 , 𝑋 ) } 1 − 𝑒 0 ( 𝑋 ) . In f act, this moment restriction leads to the well-kno wn AIPW (Augmented Inv erse Probability W eighting) estimator introduced in R obins et al. (1995). W e compute 𝑓 ( 𝑡 ) = 𝐸 𝑃 𝑂 [ 𝑚 { 𝑂 ; 𝜃 0 , ℎ 0 + 𝑡 ( ℎ − ℎ 0 ) } ] . It is better in such calculations to simplify 𝑓 ( 𝑡 ) as much as possible before differentiating. Proposition 2. Let 𝑓 ( 𝑡 ) = 𝐸 𝑃 𝑂 [ 𝑚 { 𝑂 ; 𝜃 0 , ℎ 0 + 𝑡 ( ℎ − ℎ 0 ) } ] , wher e 𝑚 ( 𝑂 ; 𝜃 0 , ℎ 0 ) = − 𝜃 0 + 𝜇 0 ( 1 , 𝑋 ) − 𝜇 0 ( 0 , 𝑋 ) + 𝑍 { 𝑌 − 𝜇 0 ( 1 , 𝑋 ) } 𝑒 0 ( 𝑋 ) − ( 1 − 𝑍 ) { 𝑌 − 𝜇 0 ( 0 , 𝑋 ) } 1 − 𝑒 0 ( 𝑋 ) , 1. 𝑓 ( 𝑡 ) = 𝐸 𝑃 𝑂 [ 𝜇 0 ( 1 , 𝑋 ) + 𝑡 { 𝜇 ( 1 , 𝑋 ) − 𝜇 0 ( 1 , 𝑋 ) }] − 𝐸 𝑃 𝑂 [ 𝜇 0 ( 0 , 𝑋 ) + 𝑡 { 𝜇 ( 0 , 𝑋 ) − 𝜇 0 ( 0 , 𝑋 ) }] + 𝐸 𝑃 𝑂 − 𝑡 𝑒 0 ( 𝑋 ) { 𝜇 ( 1 , 𝑋 ) − 𝜇 0 ( 1 , 𝑋 ) } 𝑒 0 ( 𝑋 ) + 𝑡 { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) } − 𝐸 𝑃 𝑂 { 1 − 𝑒 0 ( 𝑋 ) } { 𝜇 ( 0 , 𝑋 ) − 𝜇 0 ( 0 , 𝑋 ) } 𝑡 1 − 𝑒 0 ( 𝑋 ) − 𝑡 { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) } − 𝜃 0 2. 𝑓 ′ ( 𝑡 ) = 𝐸 𝑃 𝑂 { 𝜇 ( 1 , 𝑋 ) − 𝜇 0 ( 1 , 𝑋 ) } − 𝐸 𝑃 𝑂 { 𝜇 ( 0 , 𝑋 ) − 𝜇 0 ( 0 , 𝑋 ) } + 𝐸 𝑃 𝑂 − 𝑒 0 ( 𝑋 ) 2 { 𝜇 ( 1 , 𝑋 ) − 𝜇 0 ( 1 , 𝑋 ) } [ 𝑒 0 ( 𝑋 ) + 𝑡 { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) }] 2 − 𝐸 𝑃 𝑂 { 1 − 𝑒 0 ( 𝑋 ) } 2 { 𝜇 ( 0 , 𝑋 ) − 𝜇 0 ( 0 , 𝑋 ) } [ 1 − 𝑒 0 ( 𝑋 ) − 𝑡 { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) }] 2 , 3. 𝑓 ′′ ( 𝑡 ) = 2 𝐸 𝑃 𝑂 𝑒 0 ( 𝑋 ) 2 { 𝜇 ( 1 , 𝑋 ) − 𝜇 0 ( 1 , 𝑋 ) } { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) } [ 𝑒 0 ( 𝑋 ) + 𝑡 { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) }] 3 − 2 𝐸 𝑃 𝑂 { 1 − 𝑒 0 ( 𝑋 ) } 2 { 𝜇 ( 0 , 𝑋 ) − 𝜇 0 ( 0 , 𝑋 ) } { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) } [ 1 − 𝑒 0 ( 𝑋 ) − 𝑡 { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) }] 3 . 4. In particular , 𝑓 ( 0 ) = 0 and 𝑓 ′ ( 0 ) = 0 and √ 𝑛 1 0 | 𝑓 ′′ ( 𝑡 ) | 𝑑 𝑡 → 0 as 𝑛 → ∞ , pro vided the nuisance paramet ers conver g e at an expected mean-squared error rate of order 𝑜 ( 𝑛 − 1 / 4 ) , in probability . Proof. 𝑓 ( 𝑡 ) = 𝐸 𝑃 𝑂 [ 𝜇 0 ( 1 , 𝑋 ) + 𝑡 { 𝜇 ( 1 , 𝑋 ) − 𝜇 0 ( 1 , 𝑋 ) } ] − 𝐸 𝑃 𝑂 [ 𝜇 0 ( 0 , 𝑋 ) + 𝑡 { 𝜇 ( 0 , 𝑋 ) − 𝜇 0 ( 0 , 𝑋 ) }] + 𝐸 𝑃 𝑂 𝑍 [ 𝑌 − 𝜇 0 ( 1 , 𝑋 ) − 𝑡 { 𝜇 ( 1 , 𝑋 ) − 𝜇 0 ( 1 , 𝑋 ) }] 𝑒 0 ( 𝑋 ) + 𝑡 { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) } − 𝐸 𝑃 𝑂 ( 1 − 𝑍 ) [ 𝑌 − 𝜇 0 ( 0 , 𝑋 ) − 𝑡 { 𝜇 ( 0 , 𝑋 ) − 𝜇 0 ( 0 , 𝑋 ) }] 1 − 𝑒 0 ( 𝑋 ) − 𝑡 { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) } − 𝜃 0 , 14 Semi-parametric Bay esian inference under Ne yman or thogonality Instead of differentiating immediately , w e hav e the ability to simplify 𝑓 b y using the proper ties 𝐸 𝑃 𝑂 [ 𝑈 | 𝑋 , 𝑍 ] = 0 and 𝐸 𝑃 𝑂 [ 𝑍 | 𝑋 ] = 𝑒 0 ( 𝑋 ) and b y noting that 𝑌 = 𝜇 0 ( 𝑍 , 𝑋 ) + 𝑈 . W e obtain the f ollo wing expression f or 𝑓 ( 𝑡 ) . 𝑓 ( 𝑡 ) = 𝐸 𝑃 𝑂 [ 𝜇 0 ( 1 , 𝑋 ) + 𝑡 { 𝜇 ( 1 , 𝑋 ) − 𝜇 0 ( 1 , 𝑋 ) }] − 𝐸 𝑃 𝑂 [ 𝜇 0 ( 0 , 𝑋 ) + 𝑡 { 𝜇 ( 0 , 𝑋 ) − 𝜇 0 ( 0 , 𝑋 ) }] + 𝐸 𝑃 𝑂 − 𝑡 𝑒 0 ( 𝑋 ) { 𝜇 ( 1 , 𝑋 ) − 𝜇 0 ( 1 , 𝑋 ) } 𝑒 0 ( 𝑋 ) + 𝑡 { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) } − 𝐸 𝑃 𝑂 { 1 − 𝑒 0 ( 𝑋 ) } { 𝜇 ( 0 , 𝑋 ) − 𝜇 0 ( 0 , 𝑋 ) } 𝑡 1 − 𝑒 0 ( 𝑋 ) − 𝑡 { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) } − 𝜃 0 . W e no w differentiate 𝑓 ter m by ter m to get : 𝑓 ′ ( 𝑡 ) = 𝐸 𝑃 𝑂 { 𝜇 ( 1 , 𝑋 ) − 𝜇 0 ( 1 , 𝑋 ) } − 𝐸 𝑃 𝑂 { 𝜇 ( 0 , 𝑋 ) − 𝜇 0 ( 0 , 𝑋 ) } + 𝐸 𝑃 𝑂 − 𝑒 0 ( 𝑋 ) 2 { 𝜇 ( 1 , 𝑋 ) − 𝜇 0 ( 1 , 𝑋 ) } [ 𝑒 0 ( 𝑋 ) + 𝑡 { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) }] 2 − 𝐸 𝑃 𝑂 { 1 − 𝑒 0 ( 𝑋 ) } 2 { 𝜇 ( 0 , 𝑋 ) − 𝜇 0 ( 0 , 𝑋 ) } [ 1 − 𝑒 0 ( 𝑋 ) − 𝑡 { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) }] 2 . Theref ore, 𝑓 ′ ( 0 ) = 𝐸 𝑃 𝑂 { 𝜇 ( 1 , 𝑋 ) − 𝜇 0 ( 1 , 𝑋 ) } − 𝐸 𝑃 𝑂 { 𝜇 ( 0 , 𝑋 ) − 𝜇 0 ( 0 , 𝑋 ) } − 𝐸 𝑃 𝑂 { 𝜇 ( 1 , 𝑋 ) − 𝜇 0 ( 1 , 𝑋 ) } + 𝐸 𝑃 𝑂 { 𝜇 ( 0 , 𝑋 ) − 𝜇 0 ( 0 , 𝑋 ) } = 0 . No w , we compute 𝑓 ′′ ( 𝑡 ) . 𝑓 ′′ ( 𝑡 ) = 2 𝐸 𝑃 𝑂 𝑒 0 ( 𝑋 ) 2 { 𝜇 ( 1 , 𝑋 ) − 𝜇 0 ( 1 , 𝑋 ) } { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) } [ 𝑒 0 ( 𝑋 ) + 𝑡 { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) }] 3 − 2 𝐸 𝑃 𝑂 { 1 − 𝑒 0 ( 𝑋 ) } 2 { 𝜇 ( 0 , 𝑋 ) − 𝜇 0 ( 0 , 𝑋 ) } { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) } [ 1 − 𝑒 0 ( 𝑋 ) − 𝑡 { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) }] 3 . By keeping in mind that denominators are positive due to the no-o ver lap proper ty , and after a simple integration of the absolute value of each of the ter ms on [ 0 , 1 ] , we can bound the integ ral of the absolute value of the first ter m by 𝐸 𝑃 𝑂 | 𝜇 ( 1 , 𝑋 ) − 𝜇 0 ( 1 , 𝑋 ) | | 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) | 𝑒 ( 𝑋 ) + 𝑒 0 ( 𝑋 ) | 𝑒 ( 𝑋 ) 2 ≤ 2 − 2 𝜖 𝜖 2 𝐸 𝑃 𝑂 { 𝜇 ( 1 , 𝑋 ) − 𝜇 0 ( 1 , 𝑋 ) } 2 𝐸 𝑃 𝑂 { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) } 2 , and the integral on [ 0 , 1 ] of the absolute value of the second ter m by 𝐸 𝑃 𝑂 | 𝜇 ( 0 , 𝑋 ) − 𝜇 0 ( 0 , 𝑋 ) | | 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) | | 2 − 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) | { 1 − 𝑒 ( 𝑋 ) } 2 ≤ 2 − 2 𝜖 𝜖 2 𝐸 𝑃 𝑂 { 𝜇 ( 0 , 𝑋 ) − 𝜇 0 ( 0 , 𝑋 ) } 2 𝐸 𝑃 𝑂 { 𝑒 ( 𝑋 ) − 𝑒 0 ( 𝑋 ) } 2 , which hence implies that √ 𝑛 1 0 | 𝑓 ′′ ( 𝑡 ) | 𝑑 𝑡 → 0 as 𝑛 → ∞ , pro vided the nuisance parameters conv erg e at a root mean-squared er ror rate of order 𝑜 ( 𝑛 − 1 / 4 ) , in probability . □ S.5 A dditional Simulations S.5.1 Simulations f or the partially linear model using kernel methods W e consider the f ollo wing model, where the cov ar iate 𝑋 ∼ N ( 0 , 1 ) . 𝑌 = 𝜃 0 𝑍 + 𝑋 + 𝑈 𝑍 = sin 𝑋 + 𝑉 where 𝜃 0 = 3 and 𝑈 , 𝑉 are independent standard Normal random variables. W e hav e used kernel methods to estimate 𝐸 ( 𝑌 | 𝑋 ) and 𝐸 ( 𝑍 | 𝑋 ) . W e implement Algorithm 1 and der iv e the posterior distribution for the targ eted parameter using 1000 Bay esian bootstrap samples, across 1000 replicate analy sis. W e repor t the results in T able S.2 of the Supplementary Mater ial. 15 Semi-parametric Bay esian inference under Ne yman or thogonality T able S.2: Characteristics of the frequentist and Bay esian distributions 𝑛 250 500 1000 A verag e of the Pos ter ior Means 2 . 99 3 . 00 3 . 00 Empirical Frequentist Mean 2 . 99 3 . 00 3 . 00 A verag e of Pos ter ior V ar iances ( × 𝑛 ) 1 . 00 1 . 00 0 . 99 Empirical Frequentist V ariance ( × 𝑛 ) 1 . 28 1 . 19 1 . 07 A verag e Sandwich Estimate 1 . 00 1 . 00 1 . 00 A verag e Bay esian credible inter val ( 2 . 87 , 3 . 12 ) ( 2 . 91 , 3 . 08 ) ( 2 . 94 , 3 . 06 ) Frequentist confidence interval ( 2 . 86 , 3 . 14 ) ( 2 . 90 , 3 . 09 ) ( 2 . 94 , 3 . 06 ) Pos terior Cov erage 92 . 90 93 . 60 94 . 50 T able S.3: Characteristics of the Bay esian and Frequentist Distributionsof the A TE based on the AIPW Estimator 𝑛 250 500 1000 A verag e of the Pos ter ior Means 2 . 96 2 . 99 3 . 00 Empirical Frequentist Mean 2 . 96 2 . 99 3 . 00 A verag e of Pos ter ior V ar iances ( × 𝑛 ) 5 . 11 5 . 20 5 . 09 Empirical Frequentist V ariance ( × 𝑛 ) 4 . 85 5 . 08 4 . 87 A verag e Sandwich Estimate 5 . 13 5 . 21 5 . 10 A verag e Bay esian credible inter val ( 2 . 68 , 3 . 24 ) ( 2 . 79 , 3 . 19 ) ( 2 . 86 , 3 . 14 ) Frequentist confidence interval ( 2 . 69 , 3 . 22 ) ( 2 . 79 , 3 . 19 ) ( 2 . 86 , 3 . 14 ) Pos terior Cov erage 95 . 20 95 . 50 95 . 0 S.5.2 Simulations f or the AIPW estimator W e no w consider the AIPW estimator of the av erag e treatment effect. The data generating mechanism is the same as the one used in subsection 4 of the main paper . Ho we v er , the estimation procedure differs as the fitted model in vol ves estimating 𝜇 ( 𝑍 , 𝑋 ) = 𝜃 0 𝑍 + 𝑔 0 ( 𝑋 ) , without assuming the additive relation between the treatment and the treatment-free part. W e estimate 𝜇 𝑜 ( 𝑍 , 𝑋 ) and 𝑒 0 ( 𝑋 ) using random-f orests with sub-sampling without replacement with subsample size 𝑚 = 𝑛 0 . 49 . W e implement Algorithm 1 and der ive the posterior distribution for the targeted parameter using 1000 Ba yesian bootstrap samples, across 1000 replicate analy sis. W e repor t the results in T able S.3 of the Supplementar y Material. S.5.3 Impact of the number co variates T able S.4 of the Supplementary Material show s the impact of the number of cov ar iates on the Bay esian and frequentist estimation. W e point out that the results presented in Section 3.4 are of asymptotic nature, and assume that the sample size 𝑛 → ∞ . In finite samples, we ma y observe bias which incites us to carefully e xamine whether the asymptotic arguments in Theorem 2 can be deplo y ed. In order to document this bias, w e ref er to T able S.4 f or a summary of the simulations f or the model considered in subsection 4 of the main paper f or five different values of 𝑞 ∈ { 5 , 6 , 8 , 10 , 20 } and f or four different sample-sizes 𝑛 ∈ { 250 , 500 , 1000 , 2000 } . W e point out that ˜ 𝜃 𝐹 and ˜ 𝑉 𝐹 denote the empir ical frequentist mean and variance (times 𝑛 ) respectiv ely . ˜ 𝜃 𝐵 , ˜ 𝑉 𝐵 are respectiv ely the av erages of the empirical posterior means and variances (times 𝑛 ). ˆ Σ denotes the a verag es of the sandwich estimates. It becomes apparent that for a given model, increasing the sample size reduces the bias as one may expect. Ho we v er , it is remarkable that if 𝑞 = 20 , bias is reduced but not eliminated ev en when 𝑛 = 2000 . The variance estimates also exhibit heavy bias and cov erage rates are belo w nominal lev el. These results reflect that in general, care must be taken before resor ting to asymptotic arguments in Bay esian and frequentis t semi-parametr ic theory , ev en when the nuisance space is assumed to be a Donsker class. Moreo v er, it may be the case that using a machine lear ning algor ithm without theoretical guarantees on stochas tic equicontinuity leads to good estimation, ev en when the number of cov ariates is high. Ho we ver , theoretical guaranties on the asymptotic behavior or on the finite sample per f ormance ma y not be av ailable. Methods to restore posterior co v erage at the nominal lev el, in the absence of the stochastic equicontinuity assumptions, are a subject of cur rent studies by the authors and are not treated in this paper . 16 Semi-parametric Bay esian inference under Ne yman or thogonality T able S.4: Characteristics of the frequentist and Bay esian estimators f or different values of 𝑝 and 𝑛 . 𝑝 𝑛 ( ˜ 𝜃 𝐹 , ˜ 𝜃 𝐵 ) ( ˜ 𝑉 𝐹 , ˜ 𝑉 𝐵 , ˆ Σ ) Co v erage 𝑞 = 5 𝑛 = 250 ( 3 . 01 , 3 . 01 ) ( 5 . 08 , 5 . 36 , 5 . 38 ) 94 . 80 𝑛 = 500 ( 3 . 02 , 3 . 02 ) ( 4 . 71 , 5 . 09 , 5 . 11 ) 95 . 30 𝑛 = 1000 ( 3 . 00 , 3 . 00 ) ( 4 . 88 , 4 . 98 , 4 . 99 ) 94 . 30 𝑛 = 2000 ( 3 . 00 , 3 . 00 ) ( 4 . 97 , 4 . 91 , 4 . 90 ) 94 . 40 𝑞 = 6 𝑛 = 250 ( 3 . 02 , 3 . 03 ) ( 4 . 98 , 5 . 61 , 5 . 63 ) 95 . 60 𝑛 = 500 ( 3 . 01 , 3 . 02 ) ( 5 . 42 , 5 . 37 , 5 . 37 ) 93 . 60 𝑛 = 1000 ( 3 . 00 , 3 . 00 ) ( 4 . 78 , 5 . 21 , 5 . 22 ) 95 . 40 𝑛 = 2000 ( 3 . 00 , 3 . 00 ) ( 4 . 82 , 5 . 07 , 5 . 09 ) 95 . 20 𝑞 = 8 𝑛 = 250 ( 3 . 05 , 3 . 05 ) ( 5 . 30 , 6 . 03 , 6 . 07 ) 95 . 30 𝑛 = 500 ( 3 . 02 , 3 . 03 ) ( 5 . 43 , 5 . 83 , 5 . 84 ) 94 . 90 𝑛 = 1000 ( 3 . 03 , 3 . 02 ) ( 5 . 10 , 5 . 64 , 5 . 64 ) 95 . 20 𝑛 = 2000 ( 3 . 01 , 3 . 02 ) ( 5 . 03 , 5 . 47 , 5 . 47 ) 93 . 80 𝑞 = 10 𝑛 = 250 ( 3 . 07 , 3 . 06 ) ( 5 . 91 , 6 . 32 , 6 . 34 ) 93 . 30 𝑛 = 500 ( 3 . 04 , 3 . 04 ) ( 5 . 57 , 6 . 03 , 6 . 04 ) 94 . 00 𝑛 = 1000 ( 3 . 03 , 3 . 03 ) ( 5 . 48 , 5 . 81 , 5 . 82 ) , 93 . 10 𝑛 = 2000 ( 3 . 02 , 3 . 02 ) ( 4 . 86 , 5 . 61 , 5 . 63 ) 94 . 80 𝑞 = 20 𝑛 = 250 ( 3 . 11 , 3 . 11 ) ( 6 . 06 , 7 . 18 , 7 . 22 ) 91 . 80 𝑛 = 500 ( 3 . 08 , 3 . 08 ) ( 5 . 57 , 6 . 90 , 6 . 91 ) 91 . 40 𝑛 = 1000 ( 3 . 06 , 3 . 06 ) ( 5 . 81 , 6 . 56 , 6 . 58 ) 89 . 50 𝑛 = 2000 ( 3 . 05 , 3 . 06 ) ( 5 . 69 , 6 . 33 , 6 . 33 ) 87 . 80 Ref erences Bissiri, P . G. , Holmes, C. C. & W alker, S. G. (2016). A g eneral framew ork f or updating belief distr ibutions. Journal of the Roy al Statistical Society : Series B (Statistical Methodology) 78 , 1103–1130. Chen, Q. , S yrgkanis, V . & A ustern, M. (2022). Debiased machine learning without sample-splitting f or stable estimators. In Advances in Neur al Information Processing Syst ems , v ol. 35. Cur ran Associates, Inc. Cheng, G. & Hu ang, J. Z. (2010). Bootstrap consistency for general semiparametr ic M-estimation. The Annals of Statistics 38 , 2884 – 2915. Chernozhuk o v , V . , Chetveriko v , D. , Demirer, M. , Duflo, E. , Hansen, C. , Newey , W . K. & Robins, J. M. (2018). Double/debiased machine lear ning f or treatment and structural parameters. The Econometrics Journal 21 , C1–C68. Efr on, B. (1979). Bootstrap Methods: Another Look at the Jackknif e. The Annals of Statistics 7 , 1 – 26. Fer guson, T . S. (1973). A Ba yesian analy sis of some nonparametric problems. The Annals of Statistics 1 , 209 – 230. Hahn, P . R. , Murra y , J. S. & C ar v alho, C. M. (2020). Bay esian regression tree models f or causal inf erence: regularization, confounding, and heterogeneous effects. Bay esian Analysis 15 , 965–1056. Kaplan, D. & Chen, J. (2012). A two-s tep Ba yesian approach for propensity score analy sis: Simulations and case study . Psyc hometrika 77 , 581–609. K osor ok, M. (2008). Introduction to Empirical Processes and Semiparametric Infer ence . Springer Series in Statis tics. Springer New Y ork. Li, F . , Ding, P . & Mealli, F . (2023). Bay esian causal inference: a critical revie w . Philosophical T r ansactions of the Ro yal Society A: Mathematical, Physical and Engineering Sciences 381 , 20220153. L yddon, S. P . , Holmes, C. C. & W alker, S. G. (2019). General Bay esian updating and the loss-likelihood bootstrap. Biometrika 106 , 465–478. McCandless, L. C. , Gust afson, P . & A ustin, P . C. (2009). Ba y esian propensity score analy sis for obser v ational data. Statistics in Medicine 28 , 94–112. Newey , W . K. (1994). The asymptotic v ar iance of semiparametr ic estimators. Econometrica 62 , 1349–1382. Newton, M. A. & Rafter y , A. E. (1994). Approximate Bay esian inference with the w eighted likelihood bootstrap. Journal of the Roy al Statistical Society . Series B (Methodological) 56 , 3–48. Pearl, J. (2009). Causal inference in statistics: An o vervie w. S tatistics Sur vey s 3 , 96 – 146. 17 Semi-parametric Bay esian inference under Ne yman or thogonality Praestg aard, J. & Wellner, J. A. (1993). Exc hangeabl y weighted bootstraps of the general empir ical process. The Annals of Probability 21 , 2053 – 2086. R obins, J. M. , R otnitzky , A. & Zhao, L. P . (1995). Anal ysis of semiparametr ic regression models f or repeated outcomes in the presence of missing data. Jour nal of the American Statistical Association 90 , 106–121. R obinson, P . M. (1988). R oot-n-consistent semiparametr ic reg ression. Econometrica 56 , 931–954. R osenb a um, P . R. & R ubin, D. B. (1983). The central role of the propensity score in obser vational studies for causal effects. Biometrika 70 , 41–55. R osenb a um, P . R. & R ubin, D. B. (1984). R educing bias in obser vational studies using subclassification on the propensity score. Jour nal of the American Statistical Association 79 , 516–524. R ubin, D. B. (1977). Assignment to treatment group on the basis of a co variate. Jour nal of Educational Statistics 2 , 1–26. R ubin, D. B. (1979). Using multivariate matched sampling and reg ression adjustment to control bias in obser vational studies. Journal of the American Statistical Association 74 , 318–328. S abba gh, M. (2025). Bay esian Causal Infer ence in Semi-P ar ametric Models . Ph.D. thesis, McGill Univ ersity . S abba gh, M. & Stephens, D. A. (2026). Pos ter ior uncer tainty f or targ eted parameters in Ba yesian bootstrap procedures Https://arxiv .org/abs/2602.02216. Sherbert , D. & B artle, R. (2020). Intr oduction to Real Analysis, F ourth Edition . Independentl y Published. Stephens, D. A. , Nobre, W . S. , Moodie, E. E. M. & Schmidt , A. M. (2023). Causal inf erence under mis-specification: A djustment based on the propensity score (with discussion). Bayesian Analysis 18 , 639 – 694. Yiu , A. , Fong, E. , Holmes, C. & R oussea u, J. (2025). Semiparametric poster ior corrections. Jour nal of the Roy al Statistical Society Series B: Statistical Methodology 87 , 1025–1054. 18
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment