Understanding and Mitigating the Impacts of Differentially Private Census Data on State Level Redistricting
Data from the Decennial Census is published only after applying a disclosure avoidance system (DAS). Data users were shaken by the adoption of differential privacy in the 2020 DAS, a radical departure from past methods. The goal of this paper is to better understand how the perturbations from the 2020 DAS combine with sharp legal thresholds to impact redistricting. We consider two redistricting settings in which a data user might be concerned about the impacts of privacy preserving noise: drawing equal population districts and litigating voting rights cases. What discrepancies arise if the user does nothing to account for disclosure avoidance? How can the discrepancies be understood and accounted for? We study these questions by comparing the official 2010 Redistricting Data to the 2010 Demonstration Data–created using the 2020 DAS–in an analysis of millions of algorithmically generated state legislative redistricting plans. We find that thresholding can amplify the impact of the noise from disclosure avoidance. Large discrepancies do occur, but in ways that are well-captured by simple models and appear to be possible to account for. We demonstrate the utility of these models by proposing an approach to mitigate discrepancies when balancing district populations. At least for state legislatures, Alabama’s claim that differential privacy “inhibits a State’s right to draw fair lines” lacks support.
💡 Research Summary
This paper investigates how the adoption of differential privacy (DP) in the 2020 Census Disclosure Avoidance System (DAS) affects state‑legislative redistricting, focusing on two legally salient dimensions: the “One Person, One Vote” (OPOV) population‑equality standard and the Voting Rights Act (VRA) requirement to create majority‑minority districts (MMDs). The authors compare the official 2010 Redistricting Data (SWAP), which reflects the traditional swapping‑based DAS, with the 2010 Demonstration Data (DEMO) that applies the 2020 DP‑based DAS to the same confidential Census Edited File (CEF). Because the confidential CEF is unavailable to external researchers, DEMO serves as a stand‑in for the noisy 2020 data, while SWAP stands in for the “ground‑truth” data.
Using an ensemble‑based approach, the researchers generate millions of plausible redistricting plans for 93 state legislative chambers (senate and house) and 52 state houses. Plans are sampled under constraints or objectives defined on DEMO (the noisy data). Each plan is then evaluated on SWAP to measure “discrepancies” – instances where a plan that satisfies a legal threshold on DEMO fails to meet the same threshold on SWAP.
Key findings for OPOV: When plans are constrained to a 5 % population deviation on DEMO (the de‑facto target used in prior work), a substantial fraction of them exceed the 5 % deviation on SWAP. In roughly half of the states examined, at least 40 % of DEMO‑compliant plans are non‑compliant on SWAP. However, the authors demonstrate a simple mitigation: tightening the sampling constraint slightly (e.g., to 4.5 % or 4 % deviation) dramatically reduces the discrepancy rate. With a 4.5 % bound, more than half of the states exhibit zero violations on SWAP; with a 4 % bound, the violation rate drops to zero in about 90 % of states. A probabilistic model treating block‑level DP noise as independent normal perturbations explains why a modest “offset” in the sampling threshold effectively buffers against the amplified noise at the district level.
Key findings for MMDs: When the objective is to maximize the number of Black‑majority districts, the DEMO‑based plans often produce fewer MMDs than the SWAP evaluation would suggest. For Georgia’s state‑house plans, 9 % of DEMO‑generated plans have a different MMD count on SWAP, typically fewer. When the algorithm explicitly maximizes MMDs, the discrepancy jumps to 66 %. This asymmetry indicates that DP noise can both under‑ and over‑estimate minority concentrations, complicating VRA litigation strategies that rely on demonstrating the feasibility of additional majority‑minority districts.
The authors provide simple analytical models for both phenomena. For population deviation, they model each block’s DP noise as a zero‑mean Gaussian with known variance; the district‑level deviation then follows a summed Gaussian distribution, allowing calculation of the probability that a DEMO‑compliant plan violates the SWAP threshold. For MMDs, they model the presence of a majority‑minority block as a Bernoulli trial with a probability perturbed by DP noise, showing how maximizing the objective amplifies the variance of the resulting MMD count.
Legal discussion contrasts Alabama’s lawsuit, which treats the confidential CEF as the “true” basis for redistricting, with the prevailing judicial view that the published Redistricting Data (even if noisy) remains the only benchmark for compliance. The paper argues that, at least for state‑legislative redistricting using precinct‑level building blocks, the DP‑induced noise is manageable and does not constitute an unconstitutional barrier to drawing “fair” maps. The proposed offset strategy and probabilistic understanding provide practical tools for mapmakers to anticipate and correct for DP‑related errors without needing access to the confidential CEF.
Limitations include the focus on state legislative chambers (not congressional districts), reliance on SWAP as a proxy for the confidential CEF, and the specific precinct‑based methodology. Extending the analysis to other redistricting contexts (e.g., municipal districts, different objective functions) would require adapting the sampling and modeling techniques.
In sum, the study shows that while the 2020 DP‑based DAS introduces non‑trivial noise into census counts, its impact on legally relevant redistricting thresholds can be quantified, modeled, and mitigated. The findings refute the claim that differential privacy “inhibits a state’s right to draw fair lines,” demonstrating that with modest technical adjustments, states can produce maps that satisfy both OPOV and VRA requirements even when using DP‑perturbed data.
Comments & Academic Discussion
Loading comments...
Leave a Comment