Revisiting Differentially Private Hypothesis Tests for Categorical Data
In this paper, we consider methods for performing hypothesis tests on data protected by a statistical disclosure control technology known as differential privacy. Previous approaches to differentially private hypothesis testing either perturbed the test statistic with random noise having large variance (and resulted in a significant loss of power) or added smaller amounts of noise directly to the data but failed to adjust the test in response to the added noise (resulting in biased, unreliable $p$-values). In this paper, we develop a variety of practical hypothesis tests that address these problems. Using a different asymptotic regime that is more suited to hypothesis testing with privacy, we show a modified equivalence between chi-squared tests and likelihood ratio tests. We then develop differentially private likelihood ratio and chi-squared tests for a variety of applications on tabular data (i.e., independence, sample proportions, and goodness-of-fit tests). Experimental evaluations on small and large datasets using a wide variety of privacy settings demonstrate the practicality and reliability of our methods.
💡 Research Summary
This paper addresses the problem of performing classical categorical hypothesis tests (goodness‑of‑fit, two‑sample proportion, and independence) under the constraints of differential privacy (DP). Prior work either added large‑variance noise directly to the test statistic, which dramatically reduced statistical power, or added smaller noise to the raw counts but then applied standard, non‑private statistical software without adjusting for the noise, leading to biased and unreliable p‑values. The authors propose a unified framework that adds Laplace noise to the input contingency tables (or more generally any zero‑mean finite‑variance noise) and then recomputes the test statistics using a newly defined asymptotic regime that explicitly accounts for the noise magnitude.
The key theoretical contribution is a “modified asymptotic regime” in which the sample size n and the noise scale b grow at comparable rates, rather than assuming b is negligible relative to n. Under this regime, the authors prove that the noisy test statistics still converge to a scaled chi‑squared distribution, with the scaling factor incorporating the variance contributed by the privacy noise. They also establish that, despite the presence of noise, the likelihood‑ratio (LR) and chi‑squared (χ²) tests remain asymptotically equivalent, preserving a fundamental relationship from classical statistics.
For each of the three testing scenarios, the paper derives explicit formulas:
- Goodness‑of‑fit – After adding Laplace noise η
Comments & Academic Discussion
Loading comments...
Leave a Comment