Multivariate Causal Effects: a Bayesian Causal Regression Factor Model
The impact of wildfire smoke on air quality is a growing concern, contributing to air pollution through a complex mixture of chemical species with important implications for public health. While previous studies have primarily focused on its association with total particulate matter (PM2.5), the causal relationship between wildfire smoke and the chemical composition of PM2.5 remains largely unexplored. Exposure to these chemical mixtures plays a critical role in shaping public health, yet capturing their relationships requires advanced statistical methods capable of modeling the complex dependencies among chemical species. To fill this gap, we propose a Bayesian causal regression factor model that estimates the multivariate causal effects of wildfire smoke on the concentration of 27 chemical species in PM2.5 across the United States. Our approach introduces two key innovations: (i) a causal inference framework for multivariate potential outcomes, and (ii) a novel Bayesian factor model that employs a probit stick-breaking process as prior for treatment-specific factor scores. By focusing on factor scores, our method addresses the missing data challenge common in causal inference and enables a flexible, data-driven characterization of the latent factor structure, which is crucial to capture the complex correlation among multivariate outcomes. Through Monte Carlo simulations, we show the model’s accuracy in estimating the causal effects in multivariate outcomes and characterizing the treatment-specific latent structure. Finally, we apply our method to US air quality data, estimating the causal effect of wildfire smoke on 27 chemical species in PM2.5, providing a deeper understanding of their interdependencies.
💡 Research Summary
The paper introduces a novel Bayesian causal regression factor model designed to estimate the multivariate causal effects of wildfire smoke on the concentrations of 27 chemical species in PM2.5 across the United States. Building on the Rubin causal model, the authors define binary treatment (presence/absence of wildfire smoke), a set of observed covariates, and a q‑dimensional outcome vector for each observational unit. They adopt the Stable Unit Treatment Value Assumption (SUTVA), positivity, and a conditional ignorability assumption that conditions on both observed covariates and treatment‑specific latent factor scores. A key innovation is the treatment‑specific prior for the latent factor scores: a probit stick‑breaking construction embedded within a Dependent Dirichlet Process (DDP). This prior yields an infinite mixture of Gaussian components whose mixing weights depend on covariates, allowing the model to capture non‑linear heterogeneity in the latent factor distribution across units and treatment arms.
The outcome model for each treatment level t (t = 0,1) is a multivariate normal regression: Y_i | X_i, T_i = t, L_it ∼ N_q( μ_t + B_t X_i + Λ_t L_it , Ψ_t ), where Λ_t is a q × J_t factor loading matrix, L_it ∈ ℝ^{J_t} are treatment‑specific factor scores, and Ψ_t is a diagonal error covariance. Conjugate normal priors are placed on μ_t and B_t, while standard priors are used for Λ_t and Ψ_t. The DDP prior on L_it is defined via a probit link: the stick‑breaking weights w_{ik}(X_i) = Φ(α_k + β_k^T X_i)∏_{l<k}
Comments & Academic Discussion
Loading comments...
Leave a Comment