Two-way Fixed Effects and Differences-in-Differences Estimators with Several Treatments

Two-way Fixed Effects and Differences-in-Differences Estimators with Several Treatments
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We study two-way-fixed-effects regressions (TWFE) with several treatment variables. Under a parallel trends assumption, we show that the coefficient on each treatment identifies a weighted sum of that treatment’s effect, with possibly negative weights, plus a weighted sum of the effects of the other treatments. Thus, those estimators are not robust to heterogeneous effects and may be contaminated by other treatments’ effects. We further show that omitting a treatment from the regression can actually reduce the estimator’s bias, unlike what would happen under constant treatment effects. We propose an alternative difference-in-differences estimator, robust to heterogeneous effects and immune to the contamination problem. In the application we consider, the TWFE regression identifies a highly non-convex combination of effects, with large contamination weights, and one of its coefficients significantly differs from our heterogeneity-robust estimator.


💡 Research Summary

The paper investigates the properties of two‑way fixed‑effects (TWFE) regressions when several treatment variables are included simultaneously. Under the standard parallel‑trends assumption, the authors show that the coefficient on each treatment does not simply capture a weighted average of that treatment’s causal effects; instead it decomposes into two parts. The first part is a weighted sum of the treatment’s own effects across groups and time periods, with weights that may be negative but sum to one—exactly the situation identified in the recent literature for a single‑treatment TWFE. The second part, which the authors term the “contamination term,” is a weighted sum of the effects of the other treatments, with weights that sum to zero. Consequently, unless the effects of the other treatments are homogeneous, the estimated coefficient is contaminated by those other effects.

The authors illustrate the contamination mechanism with two simple examples that involve “forbidden comparisons” (a term borrowed from Bacon 2017). In the first example, the coefficient on treatment 1 is driven by a difference‑in‑differences (DID) that compares a group moving from untreated to receiving both treatments with a control group moving from untreated to receiving only treatment 2. If treatment 2’s effect is the same in both groups, it cancels out; if not, it biases the coefficient on treatment 1. The second example uses a control group that receives treatment 2 in both periods; a time‑varying effect of treatment 2 again contaminates the estimate of treatment 1. These examples demonstrate that the presence of multiple treatments creates “forbidden” comparisons that invalidate the usual TWFE interpretation.

A striking implication is that omitting other treatments from the regression does not necessarily reduce bias. While in a single‑treatment setting dropping covariates eliminates the contamination term, in the multi‑treatment case the omitted‑variable bias can be larger or smaller depending on the relative magnitudes of the weighting schemes. The authors derive the maximal possible bias for both the full model (all treatments) and the reduced model (one treatment only) under a bounded‑heterogeneity assumption, and show that the ratio of these maximal biases is identifiable from the data and independent of the bound itself. Hence, researchers can empirically assess whether controlling for additional treatments is likely to improve or worsen the estimator.

To address the identified shortcomings, the paper proposes a new DID estimator that is robust to heterogeneous effects and immune to the contamination problem. The estimator works by matching “switching groups” – groups whose focal treatment changes between periods while all other treatments remain constant – with “control groups” that experience no change in any treatment and had the same treatment profile as the switching groups in the preceding period. This double‑matching strategy guarantees that (i) any heterogeneity across groups in the effects of all treatments is differenced out, and (ii) any time‑varying heterogeneity in the effects of all treatments is also eliminated. The authors acknowledge that this approach may suffer from limited external validity and reduced statistical precision when suitable control groups are scarce. To mitigate inference issues, they provide two confidence‑interval constructions: a standard asymptotically valid interval under weak conditions, and an exact‑coverage interval that relies on normality but remains asymptotically valid without that assumption.

The methodology is illustrated with an empirical application revisiting Hotz and Xiao (2011). Using state‑level data on daycare quality, the authors examine two regulations: a minimum number of schooling years for daycare directors and a minimum staff‑to‑child ratio. When both regulations are entered into a TWFE regression, the coefficient on the schooling‑years regulation is a highly non‑convex combination of its own effects and the effects of the staff‑to‑child ratio, with large negative weights attached to many cells. A TWFE regression that includes only the schooling‑years regulation yields much smaller (though still potentially biased) weights. Computing the maximal bias metric shows that the full TWFE model’s bias can be roughly five times larger than that of the reduced model, suggesting that “controlling for more treatments” can be detrimental. Finally, the proposed heterogeneity‑robust DID estimator produces an effect close to zero, significantly different from the TWFE estimate, confirming that the TWFE result is driven by contamination.

In sum, the paper makes three key contributions: (1) it formally characterizes the bias structure of multi‑treatment TWFE regressions, revealing a contamination term absent in the single‑treatment case; (2) it demonstrates that omitting treatments does not universally reduce bias and provides a data‑driven metric to compare maximal biases; (3) it introduces a practical, robust DID estimator that eliminates both treatment‑specific and cross‑treatment heterogeneity. These results have immediate implications for applied researchers who routinely estimate policy effects with panel data and multiple, potentially overlapping interventions. The work cautions against naïve interpretation of TWFE coefficients in multi‑treatment settings and offers a viable alternative for credible causal inference.


Comments & Academic Discussion

Loading comments...

Leave a Comment