Demystify Doubly-Robust Estimation: The Role of Overlap

Demystify Doubly-Robust Estimation: The Role of Overlap
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The doubly-robust (DR) estimator is popular for evaluating causal effects in observational studies and is often perceived as more desirable than inverse probability weighting (IPW) or outcome modeling alone because it provides extra protection against model misspecification. However, double robustness is an asymptotic property that may not hold in finite samples. We investigate how the finite sample performance of the DR estimator depends on the degree of covariate overlap between comparison groups. Using analytical illustrations and extensive simulations under various scenarios with different degrees of covariate overlap and model specifications, we examine the bias and variance of the DR estimator relative to IPW and outcome modeling estimators. We find that: (i) specification of the outcome model has a stronger influence on the DR estimates than specification of the propensity score model, and this dominance increases as overlap decreases; (ii) with poor overlap, the DR estimator generally amplifies the adverse consequences of extreme weights (large bias and/or variance) regardless of model specifications, and is often inferior to both the IPW and outcome modeling estimators. As a practical guide, we recommend always first checking the degree of overlap in applications. In the case of poor overlap, analysts should consider shifting the target population to a subpopulation with adequate overlap via methods such as trimming or overlap weighting.


💡 Research Summary

This paper provides a critical examination of the finite-sample performance of the doubly-robust (DR) estimator for causal effects, challenging the common perception of its universal superiority. While the DR estimator’s theoretical appeal lies in its double robustness—consistency if either the propensity score (PS) model or the outcome model is correctly specified—the authors demonstrate that this is an asymptotic property highly dependent on covariate overlap between treatment groups in practical, finite-sample settings.

The study employs a combination of analytical decomposition and extensive simulation. The key insight comes from expressing the finite-sample error of the DR estimator relative to the sample average treatment effect (SATE). This error term reveals that for units with extreme estimated propensity scores (near 0 or 1, indicating poor overlap), any misspecification error in the outcome model is dramatically amplified by the inverse probability weights. This exposes a vulnerability: the DR estimator inherits and can even exacerbate the instability of inverse probability weighting (IPW) in regions of limited overlap.

Simulations were conducted under various scenarios manipulating the degree of covariate overlap (good vs. poor) and treatment prevalence (balanced ~0.4 vs. imbalanced ~0.1). Performance was evaluated for the DR, IPW, outcome regression, and overlap weighting estimators, as well as DR with propensity score trimming. The scenarios also included model misspecification (wrong functional form or omitted variable) in either the PS or outcome model.

The core findings are twofold. First, the specification of the outcome model exerts a stronger influence on the DR estimates than the specification of the PS model. This dominance becomes more pronounced as covariate overlap decreases. Second, in the presence of poor overlap, the DR estimator generally amplifies the adverse consequences of extreme weights, leading to large bias and/or variance. Consequently, under poor overlap, the DR estimator often performs worse than both the IPW and the outcome modeling estimators alone, contrary to its theoretical promise. Trimming extreme weights or using overlap weighting substantially improves the performance of DR and other weighting estimators in these challenging scenarios.

As a major practical contribution, the authors recommend that applied researchers should always begin causal analyses by diagnosing the degree of covariate overlap, using visual tools or quantitative metrics. If overlap is found to be poor, they advise shifting the inferential target from the average treatment effect (ATE) for the entire population to an effect for a subpopulation with adequate overlap, using methods like trimming or overlap weighting. This pragmatic approach prioritizes stable and precise estimation over strict adherence to the original, but infeasible, ATE target when the data lack the necessary common support. The paper effectively demystifies the DR estimator, highlighting that its celebrated robustness is conditional on sufficient overlap and that careful data examination is paramount before its application.


Comments & Academic Discussion

Loading comments...

Leave a Comment