Atlantic Causal Inference Conference (ACIC) Data Analysis Challenge 2017

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This brief note documents the data generating processes used in the 2017 Data Analysis Challenge associated with the Atlantic Causal Inference Conference (ACIC). The focus of the challenge was estimation and inference for conditional average treatment effects (CATEs) in the presence of targeted selection, which leads to strong confounding. The associated data files and further plots can be found on the first author’s web page.

💡 Research Summary

The paper documents the data‑generating processes (DGPs) and evaluation framework used in the 2017 Atlantic Causal Inference Conference (ACIC) Data Analysis Challenge. The challenge was designed to test methods for estimating conditional average treatment effects (CATEs) under “targeted selection,” a setting where the probability of receiving treatment depends on the expected untreated outcome, creating strong confounding.

To this end the organizers created 32 distinct DGPs. For each DGP they generated 250 independent replicates, yielding a total of 8,000 synthetic data sets. All DGPs satisfy strong ignorability and the stronger no‑unmeasured‑moderation assumption, so that the true CATE is well defined. The covariates are a subset of eight variables drawn from the Infant Health and Development Program (IHDP) data (mother’s age, cigarettes per day, endocrine condition, nervous system condition, obstetric complications, birth place, race, and child’s bilirubin).

Error structures are divided into four families: (1) additive independent and identically distributed (i.i.d.) Gaussian errors, (2) additive errors with group‑correlated components (shared across the 16 levels of the birth‑place variable), (3) additive heteroskedastic errors (variance varies with birth place), and (4) non‑additive errors obtained by applying a nonlinear CDF transformation to an additive‑error outcome. Within each family the authors varied three parameters: effect magnitude (ξ ∈ {1/3, 2}), noise level (η ∈ {1/4, 5/4}), and selection strength (κ ∈ {(0.5, 0), (3, −1)}). The binary encoding of high/low settings for these three parameters yields eight scenarios per error family.

The structural equations are:

μ(x) = −sin(Φ(π(x))) + x₄₃,
τ(x) = ξ·(x₃·x₂₄ + (x₁₄ − 1) − (x₁₅ − 1)),
π(x) = Pr(Z = 1|x) =

Atlantic Causal Inference Conference (ACIC) Data Analysis Challenge 2017

💡 Research Summary

Comments & Academic Discussion

Leave a Comment