Two-stage Estimation for Causal Inference Involving a Semi-continuous Exposure

Two-stage Estimation for Causal Inference Involving a Semi-continuous Exposure
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Methods for causal inference are well developed for binary and continuous exposures, but in many settings, the exposure has a substantial mass at zero-such exposures are called semi-continuous. We propose a general causal framework for such semi-continuous exposures, together with a novel two-stage estimation strategy. A two-part propensity structure is introduced for the semi-continuous exposure, with one component for exposure status (exposed vs unexposed) and another for the exposure level among those exposed, and incorporates both into a marginal structural model that disentangles the effects of exposure status and dose. The two-stage procedure sequentially targets the causal dose-response among exposed individuals and the causal effect of exposure status at a reference dose, allowing flexibility in the choice of propensity score methods in the second stage. We establish consistency and asymptotic normality for the resulting estimators, and characterise their limiting values under misspecification of the propensity score models. Simulation studies evaluate finite sample performance and robustness, and an application to a study of prenatal alcohol exposure and child cognition demonstrates how the proposed methods can be used to address a range of scientific questions about both exposure status and exposure intensity.


💡 Research Summary

This paper addresses a significant gap in causal inference methodology by developing a comprehensive framework for analyzing “semi-continuous exposures.” These are exposures, such as alcohol consumption or pollutant levels, where a substantial proportion of the study population has no exposure (a point mass at zero), while the exposed individuals exhibit a continuous range of exposure levels. Standard methods designed for purely binary or continuous exposures are inadequate for this common data structure.

The authors propose a novel two-part propensity score structure to model the dual nature of the exposure: one component models the probability of being exposed (binary indicator A), and the other models the level of exposure (dose D) among the exposed. This structure is integrated into a Marginal Structural Model (MSM) that explicitly separates the causal effect of exposure status (being exposed at a reference dose) from the causal dose-response effect (change in outcome per unit increase in dose among the exposed).

The core methodological innovation is a two-stage estimation procedure. In Stage I, the causal dose-response effect is estimated using propensity score regression adjustment, but only among the exposed individuals. In Stage II, the effect of exposure status at a specified reference dose (e.g., the average dose among the exposed) is estimated using the entire sample. Crucially, the dose-response estimate from Stage I is incorporated as a fixed offset in Stage II. This second stage offers flexibility, allowing analysts to use propensity score regression adjustment, Inverse Probability Weighting (IPW), or Augmented IPW (AIPW) for estimation, with AIPW providing double robustness properties.

The paper establishes the theoretical foundations for the proposed estimators, proving their consistency and asymptotic normality under correct model specification. A particularly valuable contribution is the characterization of the limiting values of the estimators when the propensity score models are misspecified, providing critical insight into the robustness and potential biases of the method.

Simulation studies comprehensively evaluate the finite-sample performance of the estimators. The results demonstrate that with sufficient sample size, the proposed methods yield minimal bias and proper coverage probabilities. The AIPW-based second-stage estimator shows particularly good performance and robustness across various scenarios of model misspecification.

The practical utility of the framework is illustrated through an application to a real-world study on the effect of prenatal alcohol exposure (PAE) on child cognition. The analysis reveals a nuanced picture: while the mere status of being exposed (at the average dose) did not show a statistically significant effect on cognitive scores, a significant negative dose-response relationship was identified among exposed mothers. This exemplifies how the method can answer distinct scientific questions—“Is any exposure harmful?” versus “How does risk change with increasing exposure?"—within a single, coherent analysis.

In conclusion, this work provides a rigorous, flexible, and practical toolbox for causal inference with semi-continuous exposures, bridging a critical methodological gap and enabling more nuanced analysis of complex exposure-outcome relationships in public health and social sciences.


Comments & Academic Discussion

Loading comments...

Leave a Comment