Omitted-Variable Sensitivity Analysis for Generalizing Randomized Trials

Randomized controlled trials (RCTs) yield internally valid causal effect estimates, but generalizing these results to target populations with different characteristics requires an untestable selection ignorability assumption: conditional on observed …

Authors: Amir Asiaee, Samhita Pal, Jared D. Huling

Omitted-Variable Sensitivity Analysis for Generalizing Randomized Trials
Omitted-V ariable Sensitivity Analysis f or Generalizing Randomized T rials Amir Asiaee 1 Samhita Pal 1 Jar ed D. Huling 2 Abstract Randomized controlled trials (RCTs) yield inter - nally v alid causal effect estimates, b ut generaliz- ing these results to target populations with dif fer- ent characteristics requires an untestable selection ignorability assumption: conditional on observed cov ariates, trial participation must be indepen- dent of potential outcomes. This assumption f ails when unobserved ef fect modifiers are distributed differently between trial and tar get populations. W e de velop a sensiti vity analysis framew ork for trial generalization grounded in omitted variable bias (O VB). Our ke y theoretical contribution is an exact decomposition sho wing that e xternal- v alidity bias equals moderation str ength × mod- erator imbalance : (i) ho w strongly an unob- served v ariable shifts the treatment ef fect, times (ii) how differently that variable is distrib uted across populations after cov ariate adjustment. W e introduce scale-free sensiti vity parameters based on partial R 2 values, enabling closed- form bounds and benchmarking against observed cov ariates—practitioners can assess whether con- clusions would change if an unobserv ed modera- tor were “as strong as” a particular observ ed vari- able. Simulations demonstrate that our bounds achiev e nominal co verage and remain conserv a- tiv e under model misspecification, while compar- isons with alternativ e sensiti vity frame works high- light the interpretive adv antages of the O VB de- composition. 1. Introduction Machine learning models trained on data from one distrib u- tion often fail when deployed to populations with dif ferent 1 Department o f Biostatistics, V anderbilt Univ ersity Medical Center , Nashville, TN, USA 2 Division of Biostatistics and Health Data Science, Uni versity of Minnesota, Minneapolis, MN, USA. Correspondence to: Amir Asiaee < amir .asiaeetaheri@vumc.org > . Pr eprint. Mar ch 31, 2026. characteristics—a phenomenon known as distrib ution shift ( Qui ˜ nonero-Candela et al. , 2009 ; K oh et al. , 2021 ). In causal inference, an analogous challenge arises when we attempt to generalize or tr ansport treatment ef fect estimates from a randomized controlled trial (RCT) to a tar get population that differs from the experimental sample. Ev en when an RCT provides internally v alid estimates of causal ef fects, these estimates may not apply to the population where a policy or intervention will actually be deployed ( Stuart et al. , 2011 ; Degtiar & Rose , 2023 ). This problem—the gap between internal and e xternal validity—is increasingly recognized as a fundamental barrier to evidence-based decision-making ( W estreich et al. , 2017 ; Colnet et al. , 2024 ). The generalization problem. Consider a pharmaceutical company that conducts a clinical trial at urban academic medical centers, b ut plans to market the resulting drug na- tionally . Or a tech company that A/B tests a ne w feature on power users who opt into beta programs, but will deplo y it to all users. In both cases, trial participants may systemat- ically differ from the target population in ways that af fect how the y respond to treatment. When treatment effects are heter ogeneous —v arying across indi viduals based on their characteristics—these differences can lead to substantial dis- crepancies between the trial sample average treatment ef fect (SA TE) and the target av erage treatment effect (T A TE). Standard solutions and their limitations. The method- ological literature on gener alizability and transportability provides estimators for the T A TE by combining trial out- comes with covariate information from a target sample ( Cole & Stuart , 2010 ; Stuart et al. , 2011 ; Buchanan et al. , 2018 ; Dahabreh et al. , 2019 ; 2021 ). These methods adjust for observed differences between trial and tar get populations using techniques such as: • In verse probability weighting (IPW) : Re weight trial observations by the in verse odds of trial participation. • Outcome modeling (g-formula) : Fit a model predict- ing outcomes from treatment and cov ariates in the trial, then av erage predictions ov er the target cov ariate dis- tribution. 1 O VB Sensitivity Analysis f or T rial Generalization • Doubly rob ust methods : Combine weighting and out- come modeling for rob ustness to model misspecifica- tion. All such estimators require an untestable identification as- sumption: after conditioning on observed cov ariates X , trial participation S must be independent of potential outcomes Y ( a ) . This selection ignorability assumption f ails whene ver there exist unmeasured effect modifiers —v ariables that both (i) influence how indi viduals respond to treatment and (ii) are distributed dif ferently between trial and target popula- tions. Why sensitivity analysis? In practice, researchers can nev er be certain that all relev ant effect modifiers ha ve been measured. Subject-matter kno wledge may suggest candi- date unmeasured moderators, but their precise distrib utions and effect-modification strengths remain unkno wn. This motiv ates sensitivity analysis : systematic exploration of how conclusions change under hypothetical violations of the identification assumptions. A well-designed sensitivity analysis should answer ques- tions such as: • Ho w strong would an unmeasured ef fect modifier need to be to change the sign of the transported effect? • How does the plausible range of the T A TE expand as we allow for progressi vely lar ger violations? • Are the violations required to overturn our conclusions plausible giv en domain knowledge? Our contribution. W e de velop a sensiti vity analysis for trial generalization based on omitted variable bias (O VB). Our approach yields transparent, interpretable bounds that decompose external-v alidity bias into two distinct compo- nents: bias = moderation strength × moderator imbal- ance . Specifically , bias arises from (i) how str ongly an unobserved v ariable U modifies the treatment ef fect, and (ii) how differ ently U is distributed between trial and tar get populations after adjusting for observed co variates. W e make the follo wing contributions: 1. O VB decomposition for exter nal validity (Section 4 ): Under a linear ef fect-moderation model, we derive an exact identity expressing the T A TE estimation error as a product of moderator strength and moderator imbal- ance. 2. Partial R 2 sensitivity parameterization (Section 5 ): W e introduce scale-free sensiti vity parameters based on partial R 2 values, enabling comparisons across out- comes and benchmarking against observed co variates. 3. Robustness summaries and benchmarking (Sec- tion 6 ): W e define “robustness values” quantifying the minimum confounding strength needed to change substantiv e conclusions, and show how to benchmark these against observed ef fect modifiers. 4. Practical workflow and experiments (Sections 7 and 8 ): W e pro vide a complete estimation w orkflow and demonstrate the method’ s performance on syn- thetic and semi-synthetic data. All proofs are deferred to Section A . 2. Related W ork Generalizability and transportability . The statistical lit- erature on extending causal inferences from trials to target populations has gro wn substantially . Cole & Stuart ( 2010 ) introduced in verse probability of sampling weights for gen- eralization, building on the potential outcomes frame work. Stuart et al. ( 2011 ) dev eloped propensity-score-based meth- ods and provided practical guidance, while Tipton ( 2013 ) proposed subclassification approaches for educational e x- periments. Hartman et al. ( 2015 ) sho wed how to combine experimental and observ ational data to estimate population treatment ef fects, and K ern et al. ( 2016 ) systematically com- pared methods including B AR T and weighting. Buchanan et al. ( 2018 ) extended these methods to comple x surve y designs, and Dahabreh et al. ( 2019 ; 2021 ) pro vided a com- prehensiv e frame work covering identification, estimation, and study design considerations. Lesko et al. ( 2017 ) offered a clear potential-outcomes perspecti ve on the k ey assump- tions. From a causal graphical perspectiv e, Pearl & Barein- boim ( 2011 ) and Bareinboim & Pearl ( 2016 ) formalized when and ho w causal effects can be transported across en- vironments using selection diagrams. Rudolph & van der Laan ( 2017 ) dev eloped TMLE-based rob ust methods for transporting ef fects across sites. Recent re views synthesize this literature: Degtiar & Rose ( 2023 ) provide a statistical ov erview , Ling et al. ( 2023 ) focus on practical applications, and Colnet et al. ( 2024 ) discuss connections to machine learning and policy learning ( Athe y & W ager , 2021 ). Sensitivity analysis f or external validity . While sensi- tivity analysis is well-de veloped for observ ational studies ( Rosenbaum , 2002 ; Robins , 2000 ; Ding & V anderW eele , 2016 ), methods specifically addressing generalization are more recent. Nguyen et al. ( 2018 ) consider the case where a moderator is observed in the trial but not in the tar- get dataset, dev eloping bounds based on the moderator’ s effect-modification strength. Nie et al. ( 2021 ) propose optimization-based bounds combining marginal sensiti vity models with cov ariate balancing constraints, allo wing for flexible b ut computationally intensi ve analysis. Dahabreh 2 O VB Sensitivity Analysis f or T rial Generalization et al. ( 2023 ) parameterize violations via bias functions on the target counterf actual means, pro viding a general frame- work b ut with parameters that lack direct interpretation. Most closely related to our work, Huang ( 2024 ) de velop a two-parameter sensiti vity analysis for weighted general- ization estimators with benchmarking capabilities; our ap- proach differs by grounding the analysis in an explicit O VB decomposition that separates moderation strength from mod- erator imbalance. Our contribution connects the generalization literature to the partial- R 2 sensiti vity framew ork of Cinelli & Hazlett ( 2020 ), yielding closed-form bounds with transparent “strength × imbalance” interpretation and enabling direct benchmarking against observed co variates. O VB and partial- R 2 sensitivity . In observational studies, omitted v ariable bias provides a foundational frame work for understanding confounding ( Cochran , 1973 ; Rosenbaum & Rubin , 1983 ). Cinelli & Hazlett ( 2020 ) transformed this framew ork by introducing partial R 2 parameters, which are scale-free and enable transparent benchmarking; their sensemakr package has been widely adopted for sensi- tivity analysis in re gression settings. Blackwell ( 2014 ) and Oster ( 2019 ) provide related approaches in political science and economics, respecti vely . Chernozhukov et al. ( 2024 ) further dev elop a general O VB theory for a broad class of causal machine learning targets (including cov ariate-shift policy effects) using Riesz representers and debiased ML inference. Our trial generalization setting can be seen as an external-v alidity analogue: latent v ariables can simultane- ously dri ve trial participation and induce treatment-effect heterogeneity . Section B sketches a mapping between our bounds and the general Riesz-representer O VB bounds. 3. Setup and Baseline Identification 3.1. Data Structure and Notation Let S ∈ { 0 , 1 } indicate population membership, where S = 1 denotes the randomized trial and S = 0 denotes the target population. Each unit i has: • Baseline cov ariates X i ∈ X ⊆ R p • Binary treatment assignment A i ∈ { 0 , 1 } • Observed outcome Y i ∈ R • Potential outcomes Y i (0) and Y i (1) under control and treatment W e observe tw o datasets: D trial = { ( X i , A i , Y i ) : S i = 1 } n r i =1 (1) D target = { X j : S j = 0 } n o j =1 (2) X U A Y S F igure 1. Causal structure for trial generalization with an un- observed ef fect modifier U . Observ ed v ariables ( X , A , Y , S ) are shaded; latent U is unshaded with a dashed border . The dashed edge U 99K S indicates that U ’ s distribution dif fers between trial ( S = 1 ) and tar get ( S = 0 ) populations, i.e., P ( U | X, S = 1)  = P ( U | X , S = 0) . If U also modifies treatment effects ( U → Y interaction with A ), selection ignorabil- ity fails and standard transport estimators are biased. In the trial, treatment is randomized; in the target, we ob- serve only covariates (no treatments or outcomes). This “non-nested” design is common in practice, though our meth- ods extend to nested designs where trial participants are sampled from the target ( Dahabreh et al. , 2021 ). Figure 1 illustrates the causal structure. The key challenge is that an unobserved v ariable U may simultaneously (i) mod- ify treatment effects (the U → Y path that varies with A ) and (ii) have a dif ferent distribution between trial and target populations (the dashed S L99 U relationship, indicating that P ( U | X , S ) depends on S ). 3.2. T arget Estimand Our goal is to estimate the target av erage treatment effect (T A TE): τ ∗ := E [ Y (1) − Y (0) | S = 0] (3) This is the av erage causal effect of treatment in the popula- tion where the intervention will be deployed, not the trial population. For comparison, the sample av erage treatment effect (SA TE) in the trial is: τ SA TE := E [ Y (1) − Y (0) | S = 1] (4) When treatment ef fects are heterogeneous and the trial and target populations dif fer in their cov ariate distributions, τ ∗ and τ SA TE may differ substantially . 3.3. T rial Internal V alidity W e maintain standard assumptions ensuring the trial pro- vides valid causal estimates for its o wn population. Assumption 3.1 (Trial internal validity) . W ithin the trial ( S = 1 ): 3 O VB Sensitivity Analysis f or T rial Generalization T able 1. Positioning of our approach relative to representati ve sensitivity analyses for trial generalization. The goal is not to replace prior framew orks, but to provide an O VB-based lens that yields closed-form, benchmarkable sensitivity summaries. Method Sensitivity parameterization Closed form Benchmarking Primary focus Nguyen et al. ( 2018 ) Moderation × imbalance Y es Limited Missing moderators observed in trial only Nie et al. ( 2021 ) Marginal sensiti vity model No No Outcome shift via odds-ratio constraints Dahabreh et al. ( 2023 ) Bias functions Y es No Selection bias in transport estimators Huang ( 2024 ) T wo-parameter model Y es Y es Sensitivity of weighted estimators Ours O VB + partial- R 2 Y es Y es External validity via O VB decomposition 1. Consistency : Y = Y ( A ) almost surely . 2. Randomization : ( Y (0) , Y (1)) ⊥ ⊥ A | X , S = 1 . 3. Positi vity : 0 < P ( A = 1 | X , S = 1) < 1 almost surely . Under Theorem 3.1 , the conditional av erage treatment effect (CA TE) in the trial population is identified: τ r ( x ) := E [ Y (1) − Y (0) | X = x, S = 1] = µ r 1 ( x ) − µ r 0 ( x ) (5) where µ r a ( x ) := E [ Y | A = a, X = x, S = 1] is the conditional mean outcome in the trial. 3.4. Baseline T ransport Assumption T o identify τ ∗ from the av ailable data, standard approaches assume: Assumption 3.2 (Selection ignorability) . For each a ∈ { 0 , 1 } : Y ( a ) ⊥ ⊥ S | X (6) Theorem 3.2 states that, conditional on observed cov ariates, potential outcomes are identically distributed across trial and target populations. Equiv alently , X captures all vari- ables that both (i) affect the outcome and (ii) are distributed differently between populations. Proposition 3.3 (Identification under selection ignorability) . Under Theorems 3.1 and 3.2 and the additional positivity assumption P ( S = 1 | X ) > 0 for all x in the target support, the T A TE is identified by: τ ∗ = E [ τ r ( X ) | S = 0] = E [ µ r 1 ( X ) − µ r 0 ( X ) | S = 0] (7) The proof follo ws directly from the tower property and Theorem 3.2 . In practice, we estimate µ r a ( x ) from trial data and av erage over the tar get covariate distrib ution. Remark 3.4 (Alternative estimators) . Equation ( 7 ) suggests an outcome-modeling (g-formula) estimator . Alternati vely , one can use in verse probability weighting with selection odds: b τ IPW = P i : S i =1 w ( X i ) A i Y i P i : S i =1 w ( X i ) A i − P i : S i =1 w ( X i )(1 − A i ) Y i P i : S i =1 w ( X i )(1 − A i ) (8) where w ( x ) = 1 − b e ( x ) b e ( x ) are the inv erse odds of trial partici- pation weights and b e ( x ) = b P ( S = 1 | X = x ) ( Buchanan et al. , 2018 ; W estreich et al. , 2017 ). Doubly rob ust estima- tors combine outcome modeling and weighting for robust- ness to misspecification of either model ( Dahabreh et al. , 2019 ). 4. An O VB Sensitivity Model for Generalization W e no w relax Theorem 3.2 by allowing for an unobserv ed effect modifier U whose distribution differs between trial and target populations. 4.1. Latent Moderator Model Assumption 4.1 (Latent moderator bridge) . There exists an unobserved random variable U such that for each a ∈ { 0 , 1 } : Y ( a ) ⊥ ⊥ S | ( X , U ) (9) Theorem 4.1 weakens Theorem 3.2 : selection ignorability holds conditional on ( X, U ) rather than X alone. If U were observed, we could adjust for it and identify τ ∗ . The problem is that U is unmeasured, so we cannot directly control for it. Example 4.2 (Unmeasured ef fect modifiers) . In a clinical trial for a cardiov ascular drug: • U = genetic variants af fecting drug metabolism • U = patient adherence patterns • U = access to complementary care These may modify treatment ef fects and be distributed dif- ferently across trial sites vs. the national population. 4.2. Linear Effect-Moderation Model T o derive tractable sensitivity bounds, we impose a linear structure on how U af fects potential outcomes. Assumption 4.3 (Linear ef fect modification) . For a ∈ { 0 , 1 } , the conditional mean potential outcome satisfies: E [ Y ( a ) | X , U, S ] = m a ( X ) + η a ( X ) · U (10) 4 O VB Sensitivity Analysis f or T rial Generalization where m a ( X ) captures the X -dependent baseline and η a ( X ) captures ef fect modification by U . Without loss of generality , we center U so that E [ U | X ] = 0 . The key quantity is the moderation str ength : β ( X ) := η 1 ( X ) − η 0 ( X ) (11) This measures ho w a one-unit increase in U changes the treatment effect at co variate v alue X . Remark 4.4 (Interpretation) . Theorem 4.3 is a first-order T aylor approximation to any smooth conditional mean func- tion. It parallels the linear sensitivity models used in obser - vational studies ( Cinelli & Hazlett , 2020 ) and captures the key “strength × imbalance” structure. W e discuss nonlinear extensions in Section 9 . 4.3. The O VB Decomposition Define the moderator imbalance between trial and target: ∆ U ( X ) := E [ U | X , S = 0] − E [ U | X , S = 1] (12) This measures ho w differently U is distributed across pop- ulations, conditional on X . If ∆ U ( X ) = 0 for all X , then Theorem 3.2 holds despite U being unmeasured. Define the X -adjusted transport estimand : τ X := E [ τ r ( X ) | S = 0] (13) This is what standard transport estimators target. Lemma 4.5 (External-validity O VB identity) . Under Theo- r ems 3.1 , 4.1 and 4.3 : τ ∗ = τ X + E [ β ( X ) · ∆ U ( X ) | S = 0] (14) In particular , if β ( X ) ≡ β is constant: τ ∗ = τ X + β · ∆ ∗ U (15) wher e ∆ ∗ U := E [∆ U ( X ) | S = 0] . Pr oof in Section A.1 . Interpr etation. Theorem 4.5 decomposes external- validity bias into tw o conceptually distinct components: 1. Moderation str ength β ( X ) : How strongly does U modify treatment effects? 2. Moderator imbalance ∆ U ( X ) : How dif ferently is U distributed between trial and target (after X - adjustment)? If either component is zero, selection ignorability holds and τ ∗ = τ X . Bias requires both treatment-ef fect heterogene- ity dri ven by U and differential distribution of U across populations. 4.4. Simple Sensitivity Interv al Theorem 4.5 immediately yields sensitivity bounds. Corollary 4.6 (Raw sensiti vity interval) . Suppose | β ( X ) | ≤ Γ and | ∆ U ( X ) | ≤ Λ almost sur ely . Then: τ ∗ ∈ [ τ X − ΓΛ , τ X + ΓΛ] (16) The parameters (Γ , Λ) ha ve direct interpretations: • Γ : Maximum effect modification—how much can a one-unit increase in U change the treatment effect? • Λ : Maximum imbalance—how much can the mean of U differ between trial and target at an y X ? While intuitiv e, these parameters are scale-dependent and hard to benchmark. W e address this next. 5. Partial R 2 Parameterization W e no w deriv e a scale-free reparameterization using partial R 2 values, following the approach of Cinelli & Hazlett ( 2020 ). 5.1. Residualized V ariables Let Π X [ · ] denote the L 2 projection onto functions of X . Define the residualized variables: e U := U − Π X [ U ] = U − E [ U | X ] (17) e S := S − Π X [ S ] = S − P ( S = 1 | X ) (18) e τ := τ − Π X [ τ ] (19) where τ := Y (1) − Y (0) is the (latent) indi vidual treatment effect and τ ( X, U ) := E [ τ | X , U ] is the full-information CA TE. Define residual variances: σ 2 τ | X := V ar( e τ ) (20) σ 2 S | X := V ar( e S ) = V ar( S )(1 − R 2 S ∼ X ) (21) where R 2 S ∼ X is the population R 2 of S regressed on X . Since S is binary , V ar( S ) = π (1 − π ) where π := P ( S = 1) under the reference distribution for ( S, X ) . 5.2. Partial R 2 Sensitivity Parameters Assumption 5.1 (Linear projection structure) . After residu- alizing on X , both the treatment ef fect and selection admit linear projections on U : e τ = b · e U + ε τ (22) e S = g · e U + ε S (23) with Co v ( e U , ε τ ) = Co v ( e U , ε S ) = 0 . 5 O VB Sensitivity Analysis f or T rial Generalization Define the partial R 2 parameters: R 2 τ ∼ U | X := V ar( b e U ) V ar( e τ ) = b 2 V ar( e U ) σ 2 τ | X ∈ [0 , 1] (24) R 2 S ∼ U | X := V ar( g e U ) V ar( e S ) = g 2 V ar( e U ) σ 2 S | X ∈ [0 , 1] (25) These hav e clear interpretations: • R 2 τ ∼ U | X : Proportion of residual treatment-effect vari- ance explained by U (after X -adjustment). • R 2 S ∼ U | X : Proportion of residual selection variance explained by U (after X -adjustment). 5.3. Partial R 2 Bound Assumption 5.2 (Constant X -adjusted imbalance) . The moderator imbalance does not v ary with X : ∆ U ( X ) ≡ ∆ ∗ U almost surely . Remark 5.3 (When is constant imbalance reasonable?) . The- orem 5.2 holds when trial participation shifts the mean of U by a constant amount after adjusting for X . If ∆ U ( X ) varies with X , then the mapping from the bias term ∆ ∗ U = E [∆ U ( X ) | S = 0] to the selection partial- R 2 is no longer determined by ( R 2 S ∼ U | X , R 2 S ∼ X ) alone; in that case, the raw (Γ , Λ) bound in Theorem 4.6 remains valid. Theorem 5.4 (P artial- R 2 bound for external-v alidity bias) . Under Theor ems 3.1 , 4.1 , 4.3 , 5.1 and 5.2 with constant β ( X ) ≡ β : | τ ∗ − τ X | ≤ σ τ | X s R 2 τ ∼ U | X · R 2 S ∼ U | X V ar( S ) (1 − R 2 S ∼ X ) (26) Pr oof in Section A.2 . Interpr etation. The bound ( 26 ) separates three ingredi- ents: 1. σ τ | X : Residual treatment-effect heterogeneity after X -adjustment (a scale factor). 2. R 2 τ ∼ U | X : Ho w much of this heterogeneity could U explain? 3. R 2 S ∼ U | X / { V ar( S )(1 − R 2 S ∼ X ) } : How strongly could U drive selection? Large bias requires substantial residual effect heterogeneity and substantial residual selection, both attributable to the same U . 6. Robustness V alues and Benchmarking 6.1. Robustness V alues The partial- R 2 bound yields r obustness values —minimum confounding strength needed to change conclusions. Proposition 6.1 (Rob ustness value) . Let B > 0 be a tar get bias magnitude (e.g., B = | b τ X | to flip the sign). Under Theor em 5.4 , any unobserved moderator must satisfy: R 2 τ ∼ U | X · R 2 S ∼ U | X ≥ V ar( S ) (1 − R 2 S ∼ X )  B σ τ | X  2 (27) to induce bias at least B . The rob ustness value (R V) is the right-hand side of ( 27 ) . A larger R V indicates greater robustness: conclusions can only change if an unobserved moderator simultaneously e xplains a large share of both residual treatment-ef fect variation and residual selection. 6.2. Benchmarking Against Observed Co variates A powerful feature of the partial- R 2 parameterization is the ability to benchmark against observed v ariables. Procedur e. T reat an observed cov ariate Z ∈ X as if it were unobserved: 1. Compute the partial R 2 of selection explained by Z giv en X − Z : R 2 S ∼ Z | X − Z = R 2 S ∼ X − R 2 S ∼ X − Z 1 − R 2 S ∼ X − Z (28) 2. Estimate the partial R 2 of treatment effect explained by Z gi ven X − Z (using effect modification re gressions in the trial). 3. Compare R 2 S ∼ Z | X − Z · R 2 τ ∼ Z | X − Z to the robustness value R V . If the product for observed cov ariates is smaller than R V , then an unobserved moderator would need to be str onger than any observ ed variable to ov erturn conclusions. Example 6.2 (Benchmark interpretation) . Suppose: • R V for sign reversal = 0.04 • Strongest observed moderator: age, with R 2 S ∼ age · R 2 τ ∼ age = 0 . 02 T o rev erse the sign, an unobserved moderator would need to be twice as str ong as age in its combined selection and effect-modification relationships. 6 O VB Sensitivity Analysis f or T rial Generalization Algorithm 1 O VB Sensitivity Analysis for Trial General- ization Require: T rial data { ( X i , A i , Y i ) } i : S i =1 , target co variates { X j } j : S j =0 Require: Sensitivity parameters: (Γ , Λ) or ( R 2 τ ∼ U | X , R 2 S ∼ U | X ) 1: Fit outcome model on trial: b µ a ( x ) 2: Compute baseline estimate: b τ X = 1 n o P j : S j =0 [ b µ 1 ( X j ) − b µ 0 ( X j )] 3: Construct baseline CI: CI base via bootstrap 4: Estimate R 2 S ∼ X from pooled ( S, X ) data 5: Estimate b σ τ | X (see Section 7.2 ) 6: Compute bias bound b using Theorem 4.6 or Equa- tion ( 26 ) 7: Output: Sensitivity interv al CI sens = [ τ − b, τ + b ] 7. Estimation and Practical W orkflow 7.1. Complete W orkflow 7.2. Estimating Component Quantities Baseline transported estimate. Any standard general- ization estimator can be used: outcome modeling, IPW , or doubly robust. W e recommend cross-fitting for valid inference ( Dahabreh et al. , 2019 ). R 2 S ∼ X . Estimable from pooled ( S, X ) data via linear probability model or logistic regression (using pseudo- R 2 ). σ τ | X . This residual CA TE standard deviation is not point- identified without assumptions about individual treatment effects. Options include: 1. T reat as sensiti vity parameter : Report curves o ver a plausible range. 2. Upper bound : Use p V ar( Y | A = 1 , X ) + V ar( Y | A = 0 , X ) from trial data. 3. Structural assumption : Assume constant within- person correlation between potential outcomes. 7.3. Combining Sampling and Sensitivity Uncertainty Let CI base = [ τ , τ ] be a (1 − α ) confidence interval for τ X under selection ignorability . A conservati ve sensiti vity interval is: CI sens = [ τ − b, τ + b ] (29) This “inflate then report” approach is standard in sensitiv- ity analysis. Sharper inference combining sampling and sensitivity uncertainty is possible b ut beyond our scope. 0 . 0 0 0 . 2 5 0 . 5 0 0 . 7 5 1 . 0 0 0 . 0 0 . 5 1 . 0 1 . 5 Mo d e ra t i o n st re n g t h ( Γ ) C o ve ra g e n _ re p s = 2 0 0 ; Λ = 0 . 2 5 (o ra cl e ); g re e n = t ru e Γ C ov er age v s Moder ati on S tr ength ( Γ ) F igure 2. Monte Carlo cov erage vs. moderation bound Γ in a linear-Gaussian DGP . Coverage is 0% for Γ < Γ ∗ and jumps to 100% at Γ = Γ ∗ (green dashed). 8. Experiments W e ev aluate the proposed sensiti vity bounds in synthetic and semi-synthetic settings; full experimental details and additional results are reported in Section C . Controlled shift with known ground truth. W e simu- late a two-population design in which an unobserved ef fect modifier U is distributed differently between the trial and target, inducing e xternal-validity bias with oracle quantities av ailable for validation (Section C ). Coverage and calibration. Figure 2 sho ws that the bias en velope is tight: cov erage is 0% when the moderation bound Γ understates the true moderation strength and be- comes valid once Γ ≥ Γ ∗ . Full confidence intervals. Figure 3 compares the sensi- tivity bias en velope alone v ersus combined with sampling uncertainty . The “Full CI” (adding 95% bootstrap intervals to the sensiti vity bounds) achiev es 100% coverage e ven at Γ = 0 , since sampling uncertainty alone spans the true effect when the point estimate is unbiased on a verage. This high- lights that our sensiti vity bounds quantify external-validity bias , complementing rather than replacing standard inferen- tial uncertainty . Benchmarking and r obustness. In a “hide one modera- tor” benchmark, observ ed-covariate partial- R 2 values pro- vide a concrete scale for sensiti vity parameters. Figure 4 compares these benchmarks to the sign-rev ersal robustness threshold; additional benchmarking plots appear in Sec- tion C . 7 O VB Sensitivity Analysis f or T rial Generalization Γ * 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 0 . 9 1 . 0 0 . 0 0 0 . 2 5 0 . 5 0 0 . 7 5 1 . 0 0 Mo d e ra t i o n st re n g t h ( Γ ) Mo n t e C a rl o co ve ra g e I n t e rva l t yp e B i a s - o n l y e n v e l o p e F u l l s e n s i t i v i t y C I n _ re p s = 2 0 0 ; L a mb d a = 0 . 2 5 (o ra cl e ); B = 5 0 0 b o o t st ra p sa mp l e s C ov er age: B i as - O nl y E nv el ope v s F ul l S ens i ti v i ty C I F igure 3. Coverage comparison: bias env elope only (solid) vs. full confidence interv al combining sensitivity bounds with bootstrap uncertainty (dashed). The full CI achiev es valid cov erage even at Γ = 0 due to sampling variability . X 1 X 2 X 3 X 4 X 5 U n o b s e r v e d m o d e r a t o r w o u l d n e e d t o b e h e r e t o r e v e r s e s i g n O b s e r v e d c o v a r i a t e s 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 R S Z | X − j 2 (Se l e ct i o n e xp l a i n e d b y co va ri a t e ) R τ Z | X − j 2 (T re a t me n t e ff e ct e xp l a i n e d b y co va ri a t e ) R V = 0 . 2 9 2 ; d a sh e d l i n e sh o w s mi n i mu m p ro d u ct t o re ve rse si g n B enc hmar k i ng: O bs er v ed C ov ar i ates v s R obus tnes s T hr es hol d F igure 4. Benchmarking scatter plot. Each observed cov ariate is plotted by its partial R 2 for selection ( x -axis) and treatment effect ( y -axis); the dashed line is the sign-re versal robustness threshold. 9. Discussion W e develop a sensitivity analysis framework for trial gen- eralization that decomposes external-v alidity bias into the str ength of effect modification by an unobserved variable and its imbalance between trial and target populations. This decomposition leads to closed-form sensiti vity bounds and an interpretable calibration via robustness v alues and bench- marking against observed covariates. Algorithm 1 sum- marizes a practical workflo w for reporting both a baseline transported estimate and a sensitivity en velope. Limitations and modeling assumptions. Our tightest bounds rely on a linear ef fect-moderation bridge (Theo- rem 4.3 ) and, for the partial- R 2 mapping, a constant X - adjusted imbalance (Theorem 5.2 ). The linear model as- sumption is restricti ve but serv es as a transparent baseline; when treatment effect heterogeneity is approximately linear in U conditional on X , the bounds remain informative. The constant imbalance assumption simplifies the relationship between the two sensitivity parameters, enabling a single- parameter robustness summary . In practice, researchers should vie w our bounds as providing a structured sensiti vity analysis rather than exact confidence intervals. When the assumptions are violated, the bounds may be either conserv ative or anti-conservati ve depending on the nature of the misspecification. Monte Carlo studies (Section 8 ) suggest the bounds perform well ev en under moderate departures from linearity . Practical guidance. W e recommend the follo wing work- flow for applied researchers: (i) report the baseline trans- ported estimate ˆ τ X with confidence intervals accounting for sampling uncertainty; (ii) compute rob ustness values R V 0 (sign rev ersal) and R V q (clinical threshold) to summarize how strong an unobserved moderator must be to change conclusions; (iii) benchmark these v alues against observed cov ariates to assess plausibility; (iv) present sensiti vity con- tour plots showing how the interval e xpands across a grid of ( R 2 τ ∼ U | X , R 2 S ∼ U | X ) values. This approach separates the empirical estimation (step i) from the sensitivity analysis (steps ii–iv), making the role of assumptions transparent. Connections to general O VB theory . A complementary route to sensitivity analysis vie ws trial generalization as a cov ariate-shift problem and applies the general O VB frame- work of Chernozhuk ov et al. ( 2024 ), which expresses bias as the cov ariance between regression and Riesz representer approximation errors. Section B sketches this connection; dev eloping sharp RR-based bounds with modern ML nui- sance estimators is a natural extension. Future dir ections. Sev eral extensions merit in vestigation: (i) allowing X -varyi n g moderation coef ficients β ( X ) while maintaining tractable bounds; (ii) incorporating sampling uncertainty in the sensiti vity parameters themselv es; (iii) ex- tending to multi-arm trials and factorial designs; (i v) devel- oping sensiti vity analyses for subgroup-specific treatment effects. The O VB decomposition provides a foundation for these extensions by clearly separating the sources of external-v alidity bias. Impact Statement This paper presents work whose goal is to advance meth- ods for assessing the e xternal validity of causal ef fect esti- mates from randomized trials. By providing tools to quan- tify uncertainty about generalization, we aim to improv e evidence-based decision-making in medicine, policy , and technology deployment. There are many potential societal 8 O VB Sensitivity Analysis f or T rial Generalization consequences of our work, none of which we feel must be specifically highlighted here. References Athey , S. and W ager, S. Policy learning with observ ational data. Econometrica , 89(1):133–161, 2021. doi: 10.3982/ ECT A15732. Bareinboim, E. and Pearl, J. Causal inference and the data- fusion problem. Pr oceedings of the National Academy of Sciences , 113(27):7345–7352, 2016. doi: 10.1073/pnas. 1510507113. Blackwell, M. A selection bias approach to sensiti vity analysis for causal effects. P olitical Analysis , 22(2):169– 182, 2014. doi: 10.1093/pan/mpt006. Buchanan, A. L., Hudgens, M. G., Cole, S. R., Mollan, K. R., Sax, P . E., Daar , E. S., Adimora, A. A., Eron, J. J., and Mugavero, M. J. Generalizing e vidence from randomized trials using in verse probability of sampling weights. Journal of the Royal Statistical Society: Series A (Statistics in Society) , 181(4):1193–1209, 2018. doi: 10.1111/rssa.12357. Chernozhukov , V ., Cinelli, C., Newe y , W ., Sharma, A., and Syrgkanis, V . Long story short: Omitted vari- able bias in causal machine learning. arXiv pr eprint arXiv:2112.13398 , 2024. V ersion v5, May 2024. Cinelli, C. and Hazlett, C. Making sense of sensiti vity: Extending omitted variable bias. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 82 (1):39–67, 2020. doi: 10.1111/rssb .12348. Cochran, W . G. Controlling bias in observational studies: A revie w . Sankhy ¯ a: The Indian Journal of Statistics, Series A , 35(4):417–446, 1973. Cole, S. R. and Stuart, E. A. Generalizing evidence from ran- domized clinical trials to target populations: The ACTG 320 trial. American Journal of Epidemiology , 172(1): 107–115, 2010. doi: 10.1093/aje/kwq084. Colnet, B., Mayer , I., V aroquaux, G., Scornet, E., and Josse, J. Causal inference methods for combining randomized trials and observational studies: A review . Statistical Science , 39(1):165–191, 2024. doi: 10.1214/23- STS889. Dahabreh, I. J., Robertson, S. E., Tchetgen Tchetgen, E. J., Stuart, E. A., and Hern ´ an, M. A. Generalizing causal in- ferences from individuals in randomized trials to all trial- eligible individuals. Biometrics , 75(2):685–694, 2019. doi: 10.1111/biom.13009. Dahabreh, I. J., Haneuse, S. J.-P . A., Robins, J. M., Robert- son, S. E., Buchanan, A. L., Stuart, E. A., and Hern ´ an, M. A. Study designs for extending causal inferences from a randomized trial to a target population. American Journal of Epidemiology , 190(8):1632–1642, 2021. doi: 10.1093/aje/kwaa270. Dahabreh, I. J., Robins, J. M., Haneuse, S. J.-P . A., Saeed, I., Robertson, S. E., Stuart, E. A., and Hern ´ an, M. A. Sensitivity analysis using bias functions for studies ex- tending inferences from a randomized trial to a target population. Statistics in Medicine , 42(13):2029–2043, 2023. doi: 10.1002/sim.9550. Degtiar , I. and Rose, S. A revie w of generalizability and transportability . Annual Review of Statistics and Its Application , 10:501–524, 2023. doi: 10.1146/ annurev-statistics- 042522- 103837. Ding, P . and V anderW eele, T . J. Sensitivity analysis without assumptions. Epidemiology , 27(3):368–377, 2016. doi: 10.1097/EDE.0000000000000457. Hartman, E., Grieve, R., Ramsahai, R., and Sekhon, J. S. From sample average treatment ef fect to population av- erage treatment eff ect on the treated: Combining experi- mental with observational studies to estimate population treatment ef fects. Journal of the Royal Statistical Society: Series A (Statistics in Society) , 178(3):757–778, 2015. doi: 10.1111/rssa.12094. Huang, M. Y . Sensitivity analysis for the generalization of experimental results. J ournal of the Royal Statistical Society: Series A (Statistics in Society) , 187(4):900–918, 2024. doi: 10.1093/jrsssa/qnae012. Kern, H. L., Stuart, E. A., Hill, J., and Green, D. P . As- sessing methods for generalizing experimental impact estimates to tar get populations. Journal of Researc h on Educational Eff ectiveness , 9(1):103–127, 2016. doi: 10.1080/19345747.2015.1060282. K oh, P . W ., Sagawa, S., Marklund, H., Xie, S. M., Zhang, M., Balsubramani, A., Hu, W ., Y asunaga, M., Phillips, R. L., Gao, I., Lee, T ., David, E., Stavness, I., Guo, W ., Earnshaw , B. A., Haque, I. S., Beery , S. M., Leskov ec, J., Kundaje, A., Pierson, E., Le vine, S., Finn, C., and Liang, P . WILDS: A benchmark of in-the-wild distribution shifts. In Pr oceedings of the 38th International Confer ence on Machine Learning , v olume 139 of PMLR , pp. 5637–5664, 2021. Lesko, C. R., Buchanan, A. L., W estreich, D., Edwards, J. K., Hudgens, M. G., and Cole, S. R. Generalizing study results: A potential outcomes perspectiv e. Epi- demiology , 28(4):553–561, 2017. doi: 10.1097/EDE. 0000000000000664. 9 O VB Sensitivity Analysis f or T rial Generalization Ling, A. Y ., Montez-Rath, M. E., Desai, M., Sebastien, B., Saeed, I., and Stuart, E. A. An ov erview of cur - rent methods for real-world applications to generalize or transport clinical trial findings to tar get populations of interest. Epidemiology , 34(5):627–636, 2023. doi: 10.1097/EDE.0000000000001633. Nguyen, T . Q., Ackerman, B., Schmid, I., Cole, S. R., and Stuart, E. A. Sensitivity analyses for ef fect modifiers not observed in the target population when generaliz- ing treatment ef fects from a randomized controlled trial: Assumptions, models, effect scales, data scenarios, and implementation details. PLOS ONE , 13(12):e0208795, 2018. doi: 10.1371/journal.pone.0208795. Nie, X., Imbens, G., and W ager , S. Covariate balancing sensitivity analysis for extrapolating randomized trials across locations. arXiv pr eprint arXiv:2112.04723 , 2021. Oster , E. Unobserv able selection and coefficient stability: Theory and e vidence. J ournal of Business & Economic Statistics , 37(2):187–204, 2019. doi: 10.1080/07350015. 2016.1227711. Pearl, J. and Bareinboim, E. Transportability of causal and statistical relations: A formal approach. In Pr oceedings of the T wenty-F ifth AAAI Conference on Artificial Intelli- gence , pp. 247–254. AAAI Press, 2011. Qui ˜ nonero-Candela, J., Sugiyama, M., Schwaighofer , A., and Lawrence, N. D. Dataset Shift in Machine Learning . MIT Press, 2009. ISBN 9780262170055. Robins, J. M. Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal in- ference models. Statistical Models in Epidemiology , the En vir onment, and Clinical T rials , 116:1–94, 2000. Rosenbaum, P . R. Observ ational studies. Springer Series in Statistics , 2002. Rosenbaum, P . R. and Rubin, D. B. Assessing sensitivity to an unobserved binary co variate in an observational study with binary outcome. Journal of the Royal Statistical Society: Series B (Methodological) , 45(2):212–218, 1983. doi: 10.1111/j.2517- 6161.1983.tb01242.x. Rudolph, K. E. and v an der Laan, M. J. Robust estimation of encouragement design intervention effects transported across sites. Journal of the Royal Statistical Society: Se- ries B (Statistical Methodology) , 79(5):1509–1525, 2017. doi: 10.1111/rssb .12213. Stuart, E. A., Cole, S. R., Bradshaw , C. P ., and Leaf, P . J. The use of propensity scores to assess the generalizability of results from randomized trials. Journal of the Royal Statistical Society: Series A (Statistics in Society) , 174(2): 369–386, 2011. doi: 10.1111/j.1467- 985X.2010.00673.x. T ipton, E. Impro ving generalizations from experiments using propensity score subclassification: Assumptions, properties, and contexts. Journal of Educational and Behavioral Statistics , 38(3):239–266, 2013. doi: 10.3102/ 1076998612441947. W estreich, D., Edwards, J. K., Lesko, C. R., Stuart, E., and Cole, S. R. T ransportability of trial results using in verse odds of sampling weights. American Journal of Epidemiology , 186(8):1010–1014, 2017. doi: 10.1093/ aje/kwx164. 10 O VB Sensitivity Analysis f or T rial Generalization A. Detailed Proofs A.1. Proof of Theor em 4.5 (Detailed) W e provide a step-by-step deri vation of the O VB identity . Pr oof. Step 1: Conditional expectations under the linear model. Fix an arbitrary cov ariate value x and population indicator s ∈ { 0 , 1 } . By Theorem 4.1 , Y ( a ) ⊥ ⊥ S | ( X, U ) , so the conditional distribution of Y ( a ) gi ven ( X, U ) is the same in both populations. By the law of iterated e xpectations: E [ Y ( a ) | X = x, S = s ] = E [ E [ Y ( a ) | X = x, U, S = s ] | X = x, S = s ] (30) By Theorem 4.3 : E [ Y ( a ) | X = x, U, S ] = m a ( x ) + η a ( x ) · U (31) Substituting: E [ Y ( a ) | X = x, S = s ] = E [ m a ( x ) + η a ( x ) · U | X = x, S = s ] (32) = m a ( x ) + η a ( x ) · E [ U | X = x, S = s ] (33) Step 2: CA TE in each population. The conditional av erage treatment effect at X = x in population s is: τ s ( x ) := E [ Y (1) − Y (0) | X = x, S = s ] (34) = E [ Y (1) | X = x, S = s ] − E [ Y (0) | X = x, S = s ] (35) = [ m 1 ( x ) + η 1 ( x ) · E [ U | X = x, S = s ]] (36) − [ m 0 ( x ) + η 0 ( x ) · E [ U | X = x, S = s ]] (37) = [ m 1 ( x ) − m 0 ( x )] + [ η 1 ( x ) − η 0 ( x )] · E [ U | X = x, S = s ] (38) = τ 0 ( x ) + β ( x ) · E [ U | X = x, S = s ] (39) where τ 0 ( x ) := m 1 ( x ) − m 0 ( x ) is the baseline CA TE (the CA TE if U = 0 ) and β ( x ) := η 1 ( x ) − η 0 ( x ) is the moderation strength. Step 3: Difference between populations. The CA TE in the target population is: τ S =0 ( x ) := E [ Y (1) − Y (0) | X = x, S = 0] = τ 0 ( x ) + β ( x ) · E [ U | X = x, S = 0] (40) The CA TE in the trial population is: τ S =1 ( x ) := E [ Y (1) − Y (0) | X = x, S = 1] = τ 0 ( x ) + β ( x ) · E [ U | X = x, S = 1] (41) The difference is: τ S =0 ( x ) − τ S =1 ( x ) = β ( x ) · [ E [ U | X = x, S = 0] − E [ U | X = x, S = 1]] (42) = β ( x ) · ∆ U ( x ) (43) where ∆ U ( x ) := E [ U | X = x, S = 0] − E [ U | X = x, S = 1] is the moderator imbalance at X = x . Step 4: A veraging over the target distrib ution. The T A TE is: τ ∗ = E [ τ S =0 ( X ) | S = 0] = E [ τ 0 ( X ) + β ( X ) · E [ U | X , S = 0] | S = 0] (44) 11 O VB Sensitivity Analysis f or T rial Generalization The X -adjusted transport estimand is: τ X = E [ τ S =1 ( X ) | S = 0] = E [ τ 0 ( X ) + β ( X ) · E [ U | X , S = 1] | S = 0] (45) Note that τ r ( X ) = τ 1 ( X ) is the trial CA TE at X . Subtracting: τ ∗ − τ X = E [ τ 0 ( X ) − τ 1 ( X ) | S = 0] (46) = E [ β ( X ) · ∆ U ( X ) | S = 0] (47) Rearranging giv es Equation ( 14 ): τ ∗ = τ X + E [ β ( X ) · ∆ U ( X ) | S = 0] (48) Step 5: Constant moderation case. If β ( X ) ≡ β is constant, factor it out: τ ∗ − τ X = E [ β · ∆ U ( X ) | S = 0] (49) = β · E [∆ U ( X ) | S = 0] (50) = β · ∆ ∗ U (51) where ∆ ∗ U := E [∆ U ( X ) | S = 0] is the av erage moderator imbalance. This gi ves Equation ( 15 ). A.2. Proof of Theor em 5.4 (Detailed) Pr oof. Step 1: Normalize U . W ithout loss of generality , assume U is scaled so that V ar( e U ) = V ar( U − E [ U | X ]) = 1 . This is a normalization that simplifies the algebra. Step 2: Relate R 2 τ ∼ U | X to b . Under Theorem 5.1 , after residualizing on X : e τ = b · e U + ε τ (52) where Co v ( e U , ε τ ) = 0 . T aking v ariances: V ar( e τ ) = b 2 V ar( e U ) + V ar( ε τ ) = b 2 + V ar( ε τ ) (53) The partial R 2 is: R 2 τ ∼ U | X = V ar( b e U ) V ar( e τ ) = b 2 σ 2 τ | X (54) Solving for | b | : | b | = σ τ | X q R 2 τ ∼ U | X (55) Step 3: Relate R 2 S ∼ U | X to g . Similarly , under Theorem 5.1 : e S = g · e U + ε S (56) The residual variance of S is: V ar( e S ) = V ar( S − E [ S | X ]) = V ar( S )(1 − R 2 S ∼ X ) (57) 12 O VB Sensitivity Analysis f or T rial Generalization The partial R 2 is: R 2 S ∼ U | X = V ar( g e U ) V ar( e S ) = g 2 V ar( S )(1 − R 2 S ∼ X ) (58) Solving for | g | : | g | = q R 2 S ∼ U | X V ar( S )(1 − R 2 S ∼ X ) (59) Step 4: Compute the imbalance coefficient. The coefficient δ in the linear projection of e U on e S is: δ = Co v ( e U , e S ) V ar( e S ) (60) Under Theorem 5.1 : Co v ( e U , e S ) = Cov( e U , g e U + ε S ) = g V ar( e U ) = g (61) Therefore: δ = g V ar( e S ) = g V ar( S )(1 − R 2 S ∼ X ) (62) T aking absolute v alues: | δ | = | g | V ar( S )(1 − R 2 S ∼ X ) = s R 2 S ∼ U | X V ar( S )(1 − R 2 S ∼ X ) (63) Under Theorem 5.2 and the centering E [ U | X ] = 0 from Theorem 4.3 , we ha ve ∆ U ( X ) ≡ ∆ ∗ U = − δ . Step 5: Combine for the bias bound. From Theorem 4.5 with constant β : | τ ∗ − τ X | = | β | · | ∆ ∗ U | (64) Under Theorem 4.3 with constant β , the linear projection coefficient satisfies b = β , and under Theorem 5.2 we ha ve ∆ ∗ U = − δ . Therefore: | τ ∗ − τ X | = | b | · | δ | (65) = σ τ | X q R 2 τ ∼ U | X · s R 2 S ∼ U | X V ar( S )(1 − R 2 S ∼ X ) (66) = σ τ | X s R 2 τ ∼ U | X · R 2 S ∼ U | X V ar( S )(1 − R 2 S ∼ X ) (67) This is Equation ( 26 ). B. Connection to General O VB Theory via Riesz Representers Our main text focuses on a linear ef fect-moderation bridge, which yields the transparent identity τ ∗ − τ X = E [ β ( X )∆ U ( X ) | S = 0] . A complementary (and more general) route is to view trial generalization as a covariate-shift problem and apply the general omitted-variable-bias (O VB) framew ork of Chernozhukov et al. ( 2024 ). W e summarize the key mapping here. 13 O VB Sensitivity Analysis f or T rial Generalization B.1. T A TE as a linear functional Fix a treatment arm a ∈ { 0 , 1 } and define the long conditional mean in the trial m ℓ a ( x, u ) := E [ Y | A = a, X = x, U = u, S = 1] , and the short regression that omits U , m s a ( x ) := E [ Y | A = a, X = x, S = 1] = E [ m ℓ a ( X, U ) | X = x, S = 1] . Under Theorem 4.1 , the same conditional mean function describes potential outcomes in the target as well. The target mean potential outcome can be written as the linear functional θ ∗ a := E [ Y ( a ) | S = 0] = E [ m ℓ a ( X, U ) | S = 0] , and the T A TE is τ ∗ = θ ∗ 1 − θ ∗ 0 . B.2. Riesz repr esenters and the role of trial participation Parameters like θ ∗ a can be written as inner products between a regression function and a Riesz repr esenter (RR). Intuitiv ely , the RR plays the role of a weighting function that “mo ves” expectations from the observed trial distribution to the target distribution. In our setting, the RR is closely related to in verse odds of trial participation weights. When we omit U , both the regression m ℓ a and the RR can change, because U can explain additional v ariation in outcomes and in selection into the trial. B.3. General O VB bound and partial R 2 reparameterization Chernozhukov et al. ( 2024 ) sho w that for a broad class of causal targets (including cov ariate-shift policy ef fects), the O VB between a “short” functional (omitting U ) and a “long” functional (including U ) admits the generic form O VB = Cov( ε m , ε α ) , where ε m and ε α are approximation errors in the regression function and in the RR, respecti vely . By Cauchy–Schwarz, | O VB | ≤ ∥ ε m ∥ L 2 ∥ ε α ∥ L 2 , and these L 2 norms can be reparameterized in terms of partial R 2 measures that quantify the incremental explanatory power of U for (i) the outcome regression and (ii) the RR. This yields scale-free sensitivity bounds that closely mirror Equation ( 26 ), while allowing for nonlinear nuisance functions estimated by modern machine learning. Dev eloping the sharpest RR-based bounds and inference specifically tailored to τ ∗ in non-nested trial generalization designs is a natural direction for our follow-up journal paper . C. Additional Experimental Details C.1. Simulation 1: Controlled Linear -Gaussian DGP Data-generating process. W e design a two-population simulation that directly induces an external-v alidity violation with known ground truth, satisfying Theorem 4.3 e xactly . The trial population ( S = 1 , denoted r ) and target population ( S = 0 , denoted o ) hav e: X r ∼ N (0 , I p ) , X o ∼ N ( µ shift 1 p , I p ) , (68) U s | X ∼ N ( γ s X 1 , 1) , s ∈ { r, o } . (69) The key feature is that the unobserv ed moderator U has a different conditional distrib ution given X in the two populations: in the trial, E [ U | X , S = 1] = γ r X 1 , while in the target, E [ U | X , S = 0] = γ o X 1 . This creates the moderator imbalance ∆ U ( X ) = ( γ o − γ r ) X 1 that driv es external-v alidity bias. T reatment is randomized within the trial: A ∼ Bernoulli (0 . 5) gi ven S = 1 . Potential outcomes follo w a linear model with U as an effect modifier: Y ( a ) = β 0 + β ⊤ X X + a · ( τ 0 + β U · U ) + ε, ε ∼ N (0 , σ 2 ) . (70) 14 O VB Sensitivity Analysis f or T rial Generalization 0 . 6 0 . 8 1 . 0 1 . 2 0 . 0 0 . 5 1 . 0 1 . 5 Mo d e ra t i o n st re n g t h ( Γ ) T A T E b o u n d s Λ = 0 . 2 5 (o ra cl e ); d a sh e d = t ru e T A T E; d o t t e d = t ru e Γ O V B S ens i ti v i ty E nv el ope ( R aw P ar ameter i z ati on) F igure 5. O VB sensitivity en velope for a single simulation. The baseline estimate ˆ τ X is biased for the true T A TE τ ∗ , and the sensitivity interval e xpands with Γ . Parameter settings. W e set n trial = 2000 , n target = 5000 , p = 5 , µ shift = 0 . 5 , γ r = 0 , γ o = 0 . 5 , β U = 0 . 5 , τ 0 = 1 , and σ = 1 . This yields oracle moderation strength Γ ∗ = β U = 0 . 5 and oracle av erage imbalance ∆ ∗ U = ( γ o − γ r ) µ shift = 0 . 25 , with true bias 0 . 125 . In this Gaussian DGP , ∆ U ( X ) is unbounded, so the almost-sure bound Λ in Theorem 4.6 does not exist; we report results using ∆ ∗ U as the imbalance scale. Sensitivity en velope. Figure 5 illustrates the O VB sensitivity en velope for a single realization. Bias-only en velope vs full sensitivity CI. The bias-only en velope accounts for systematic bias from omitted moderators but not sampling uncertainty in ˆ τ X . Combining the O VB bound with a confidence interval for ˆ τ X yields a full sensitivity CI; Figure 6 compares the two approaches. C.2. Simulation 2: Nonlinear DGP W e relax the linear assumptions to test robustness using a nonlinear baseline and heterogeneous moderation: • Nonlinear baseline: m ( X ) = β 0 + sin( π X 1 / 2) + 0 . 5 X 2 2 • Heterogeneous moderation: β ( X ) = β base + β X X 1 This violates the linear ef fect-modification model in Theorem 4.3 and tests whether the bounds remain informati ve under misspecification. C.3. Simulation 3: High-Dimensional ML Setting W e consider a high-dimensional simulation with p = 50 cov ariates (only 10 relev ant), nonlinear baseline outcomes, and logistic selection into the trial. W e compare linear regression with interactions, LASSO, and random forest as baseline estimators. 15 O VB Sensitivity Analysis f or T rial Generalization Γ * 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 0 . 9 1 . 0 0 . 0 0 0 . 2 5 0 . 5 0 0 . 7 5 1 . 0 0 Mo d e ra t i o n st re n g t h ( Γ ) Mo n t e C a rl o co ve ra g e I n t e rva l t yp e B i a s - o n l y e n v e l o p e F u l l s e n s i t i v i t y C I n _ re p s = 2 0 0 ; L a mb d a = 0 . 2 5 (o ra cl e ); B = 5 0 0 b o o t st ra p sa mp l e s C ov er age: B i as - O nl y E nv el ope v s F ul l S ens i ti v i ty C I F igure 6. Comparison of bias-only en velope vs full sensitivity CI cov erage. The full CI incorporates bootstrap uncertainty in ˆ τ X in addition to the O VB bias bound. C.4. Comparison with Marginal Sensitivity Model W e compare our O VB en velope to a marginal sensitivity model (MSM) that bounds selection odds ratios rather than moderator strength. C.5. Semi-Synthetic Benchmark and Robustness V isualization W e demonstrate benchmarking via a “hide one moderator” experiment: we generate data with fiv e observed ef fect modifiers and then treat one cov ariate as unobserved, using its estimated partial- R 2 values as sensiti vity parameters. Figure 10 shows the resulting benchmark scatter plot. Figure 11 visualizes the sign-reversal re gion for ( R 2 τ ∼ U | X , R 2 S ∼ U | X ) ; Figure 4 in the main text o verlays observed cov ariates against the robustness threshold. C.6. Computational Details All experiments use 500 replications. Parallel computation uses min( n cores − 1 , 8) cores. Bootstrap confidence intervals use 1000 resamples. 16 O VB Sensitivity Analysis f or T rial Generalization 0 . 0 0 0 . 2 5 0 . 5 0 0 . 7 5 1 . 0 0 0 . 0 0 0 . 2 5 0 . 5 0 0 . 7 5 1 . 0 0 Mo d e ra t i o n st re n g t h ( Γ ) C o ve ra g e n _ re p s = 2 0 0 ; t e st i n g co n se rva t i sm u n d e r mi ssp e ci fi ca t i o n C ov er age v s Γ ( D G P 2: N onl i near ) F igure 7. Coverage under a nonlinear DGP with heterogeneous moderation. Coverage increases with Γ but does not reach 95% at Γ = 1 , reflecting that a constant bound cannot fully capture X -v arying moderation strength. 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 0 . 0 0 . 5 1 . 0 1 . 5 Mo d e ra t i o n st re n g t h ( Γ ) C o ve ra g e Est i ma t o r L i n e a r L A S S O R a n d o m F o r e s t p = 5 0 co va ri a t e s, 1 0 re l e va n t ; n _ re p s = 1 0 0 ; L a mb d a = 0 . 0 9 0 (o ra cl e ) C ov er age by E s ti mator ( H i gh- D i mens i onal D G P ) F igure 8. Coverage in a high-dimensional setting. O VB bounds correct for omitted moderators but do not correct baseline estimator misspecification. 17 O VB Sensitivity Analysis f or T rial Generalization T r u e T A T E 0 . 0 0 0 . 2 5 0 . 5 0 0 . 7 5 1 . 0 0 1 . 2 5 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 Se n si t i vi t y p a ra me t e r ( Γ ) T re a t me n t e ff e ct b o u n d s Me t h o d O V B ( o u r s ) Ma r g i n a l S e n s i t i v i t y D a sh e d l i n e = t ru e T A T E; b o t h me t h o d s ca l i b ra t e d t o si mi l a r p a ra me t e r sca l e S ens i ti v i ty B ounds : O V B v s Mar gi nal S ens i ti v i ty Model F igure 9. Comparison of O VB sensitivity bounds vs marginal sensiti vity model bounds. X 1 X 2 X 3 X 4 X 5 0 . 0 0 . 2 0 . 4 0 . 6 0 . 0 0 0 . 0 5 0 . 1 0 0 . 1 5 R S Z | X − Z 2 R τ Z | X − Z 2 R 2 p r o d u c t 0 . 0 0 5 0 . 0 1 0 0 . 0 1 5 0 . 0 2 0 L a r g e r p o i n t s = s t r o n g e r j o i n t a s s o c i a t i o n w i t h s e l e c t i o n a n d t r e a t m e n t e ff e c t C o v a r i a te B e n c h m a r k s F igure 10. Cov ariate benchmarks for O VB sensitivity . Each point is an observed cov ariate, plotted by its partial R 2 with selection and with treatment effect. 18 O VB Sensitivity Analysis f or T rial Generalization R V t h r e s h o l d 0 . 0 0 0 . 2 5 0 . 5 0 0 . 7 5 1 . 0 0 0 . 0 0 0 . 2 5 0 . 5 0 0 . 7 5 1 . 0 0 R τ U | X 2 (T re a t me n t e ff e ct e xp l a i n e d b y U ) R S U | X 2 (Se l e ct i o n e xp l a i n e d b y U ) S i g n p r e s e r v e d S i g n c o u l d r e v e r s e R V = 0 . 2 9 2 ; Po i n t s a b o ve cu rve w o u l d re ve rse si g n o f e ff e ct R obus tnes s V al ue C ontour for S i gn R ev er s al F igure 11. Robustness v alue contour . The red region sho ws ( R 2 τ ∼ U | X , R 2 S ∼ U | X ) combinations that would induce sign rev ersal. 19

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment