Causal Inference for Preprocessed Outcomes with an Application to Functional Connectivity

Causal Inference for Preprocessed Outcomes with an Application to Functional Connectivity
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In biomedical research, repeated measurements within each subject are often processed to remove artifacts and unwanted sources of variation. The resulting data are used to construct derived outcomes that act as proxies for scientific outcomes that are not directly observable. Although intra-subject processing is widely used, its impact on inter-subject statistical inference has not been systematically studied, and a principled framework for causal analysis in this setting is lacking. In this article, we propose a semiparametric framework for causal inference with derived outcomes obtained after intra-subject processing. This framework applies to settings with a modular structure, where intra-subject analyses are conducted independently across subjects and are followed by inter-subject analyses based on parameters from the intra-subject stage. We develop multiply robust estimators of causal parameters under rate conditions on both intra-subject and inter-subject models, which allows the use of flexible machine learning. We specialize the framework to a mediation setting and focus on the natural direct effect. For high dimensional inference, we employ a step-down procedure that controls the exceedance rate of the false discovery proportion. Simulation studies demonstrate the superior performance of the proposed approach. We apply our method to estimate the impact of stimulant medication on brain connectivity in children with autism spectrum disorder.


💡 Research Summary

The paper addresses a pervasive yet under‑examined problem in biomedical research: the impact of subject‑level preprocessing (e.g., motion correction, artifact regression) on downstream causal inference when the final outcomes are derived rather than directly observed. Using functional magnetic resonance imaging (fMRI) as a motivating example, the authors note that repeated measurements within each participant are routinely cleaned, filtered, or even partially discarded before any inter‑subject analysis. While such intra‑subject pipelines are essential for obtaining reliable derived outcomes such as functional connectivity (FC), they introduce an additional source of estimation error that standard causal‑inference theory does not accommodate.

To fill this gap, the authors propose a hierarchical, semiparametric framework that explicitly models both the intra‑subject processing stage and the inter‑subject causal stage. At the first level, each subject’s raw signals (X_t) and nuisance variables (H_t) are transformed by a set of subject‑specific functions (f) (e.g., regression on motion parameters) and then aggregated by a deterministic mapping (g) (e.g., Fisher‑z transformed Pearson correlations) to produce a vector of derived outcomes (Y). The second level treats the treatment (A) (stimulant medication), a mediator (M) (motion trait such as mean framewise displacement), and the derived outcome (Y) within a causal mediation diagram. The primary causal quantities of interest are the average treatment effect (ATE), the natural direct effect (NDE), and the natural indirect effect (NIE), all expressed as contrasts of the functional (\psi(a,a’) = E


Comments & Academic Discussion

Loading comments...

Leave a Comment