Bounding causal effects with an unknown mixture of informative and non-informative missingness

Bounding causal effects with an unknown mixture of informative and non-informative missingness
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In experimental and observational data settings, researchers often have limited knowledge of the reasons for missing outcomes. To address this uncertainty, we propose bounds on causal effects for missing outcomes, accommodating the scenario where missingness is an unobserved mixture of informative and non-informative components. Within this mixed missingness framework, we explore several assumptions to derive bounds on causal effects, including bounds expressed as a function of user-specified sensitivity parameters. We develop influence-function based estimators of these bounds to enable flexible, non-parametric, and machine learning based estimation, achieving root-n convergence rates and asymptotic normality under relatively mild conditions. We further consider the identification and estimation of bounds for other causal quantities that remain meaningful when informative missingness reflects a competing outcome, such as death. We conduct simulation studies and illustrate our methodology with a study on the causal effect of antipsychotic drugs on diabetes risk using a health insurance dataset.


💡 Research Summary

This paper tackles a pervasive problem in causal inference: outcomes are often missing, and the missingness mechanism may be a mixture of non‑informative (ignorable) and informative (non‑ignorable) processes. When the missingness is purely non‑informative, the standard “missing at random” (MAR) assumption enables point identification of causal estimands such as the average treatment effect (ATE). However, in many real‑world settings—randomized trials with adverse events, observational studies using electronic health records, or insurance claims data—some missing outcomes are plausibly related to the unobserved outcomes themselves (e.g., patients drop out because of side‑effects that also affect the outcome). The authors formalize this situation by introducing two latent binary indicators: (U_{NI}) for non‑informative missingness and (U_{I}) for informative missingness. The observed missingness indicator (C) equals the logical OR of these two latent variables, and the two mechanisms are assumed mutually exclusive for conceptual clarity.

The primary causal target is the ATE, \


Comments & Academic Discussion

Loading comments...

Leave a Comment