A novel decomposition to explain heterogeneity in observational and randomized studies of causality
This paper introduces a novel decomposition framework to explain heterogeneity in causal effects observed across different studies, considering both observational and randomized settings. We present a formal decomposition of between-study heterogeneity, identifying sources of variability in treatment effects across studies. The proposed methodology allows for robust estimation of causal parameters under various assumptions, addressing differences in pre-treatment covariate distributions, mediating variables, and the outcome mechanism. Our approach is validated through a simulation study and applied to data from the Moving to Opportunity (MTO) study, demonstrating its practical relevance. This work contributes to the broader understanding of causal inference in multi-study environments, with potential applications in evidence synthesis and policy-making.
💡 Research Summary
The paper proposes a novel, fully non‑parametric framework for decomposing between‑study heterogeneity in causal effects observed across both observational and randomized studies. Building on structural causal models (SCM), the authors define three distinct sources of heterogeneity: (1) case‑mix heterogeneity arising from differences in the distribution of pre‑treatment covariates (W), (2) mediator‑related heterogeneity caused by study‑specific distributions of post‑treatment mediators (M), and (3) pure effect modification, where the study indicator (S) itself exerts a direct influence on the outcome (Y) independent of treatment or mediators.
Mathematically, the overall difference in average treatment effects (δ = ATE₁ − ATE₀) is first split into δ_EH (effect heterogeneity) and δ_CM (case‑mix). The term δ_EH is further decomposed into δ_EM (mediator‑related) and δ_PE (pure effect) components. Under the structural null hypotheses, each component equals zero when its corresponding source of heterogeneity is absent.
Identification relies on extensions of the familiar S‑ignorability and S‑admissibility assumptions used in data‑fusion literature. S‑ignorability assumes that, conditional on W, counterfactual outcomes are independent of study selection; S‑admissibility adds conditioning on the counterfactual mediator values. Additional assumptions (e.g., no unmeasured direct effect of S on Y) are required to isolate the pure effect term.
The authors develop a non‑parametric estimator that combines overlap‑weighting for covariate balance with a sequential re‑weighting of mediator distributions, allowing each component of the decomposition to be estimated without imposing parametric models. They employ kernel density estimation and cross‑fitted conditional outcome regressions to achieve double‑robustness and mitigate over‑fitting.
A comprehensive simulation study varies the three sources of heterogeneity independently and jointly. Results show that the proposed estimator accurately recovers each component, with substantially lower bias and mean‑squared error than methods that ignore mediator differences or treat heterogeneity as a single block.
The methodology is applied to the Moving to Opportunity (MTO) randomized housing‑voucher trial, which was conducted in five U.S. cities. After excluding Baltimore due to a concurrent intervention, the authors analyze city‑specific effects on adolescent psychiatric diagnoses, using school‑poverty as a mediator. The decomposition reveals that differences across cities are driven largely by (i) variation in the mediator (school poverty) and (ii) a direct city‑level effect, while covariate distribution differences play a minor role.
Finally, the paper extends the framework to more than two studies, presenting a hierarchical decomposition that simultaneously estimates heterogeneity contributions across multiple sites. This extension enables evidence synthesis and meta‑analysis to move beyond simple heterogeneity statistics (e.g., I²) toward a mechanistic understanding of why effects differ.
Overall, the work offers a rigorous, flexible tool for researchers and policymakers to dissect and quantify the origins of effect heterogeneity in multi‑study settings, facilitating more informed decisions about when and how to transport causal findings across populations.
Comments & Academic Discussion
Loading comments...
Leave a Comment