Computational Hardness of Private Coreset

Computational Hardness of Private Coreset
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We study the problem of differentially private (DP) computation of coreset for the $k$-means objective. For a given input set of points, a coreset is another set of points such that the $k$-means objective for any candidate solution is preserved up to a multiplicative $(1 \pm α)$ factor (and some additive factor). We prove the first computational lower bounds for this problem. Specifically, assuming the existence of one-way functions, we show that no polynomial-time $(ε, 1/n^{ω(1)})$-DP algorithm can compute a coreset for $k$-means in the $\ell_\infty$-metric for some constant $α> 0$ (and some constant additive factor), even for $k=3$. For $k$-means in the Euclidean metric, we show a similar result but only for $α= Θ\left(1/d^2\right)$, where $d$ is the dimension.


💡 Research Summary

The paper investigates the computational limits of constructing differentially private (DP) coresets for the k‑means clustering objective. A coreset is a (possibly weighted) small summary of a dataset that approximates the k‑means cost of any set of k centers within a multiplicative factor (1 ± α) and an additive term β. While non‑private coreset construction is well‑studied and can be trivial if size is unrestricted, DP coreset construction has so far only been achieved by algorithms with exponential dependence on the number of clusters k or the dimension d, leaving open whether this exponential blow‑up is inherent.

The authors provide the first computational lower bounds for DP coreset construction under standard cryptographic assumptions (the existence of one‑way functions). Their results are twofold:

  1. Hardness in the ℓ∞‑metric: For the ℓ∞ norm and even for k = 3, no polynomial‑time (ε, δ)-DP algorithm with δ = 1/n^{ω(1)} can output a (k, ∞, α, β)‑coreset for any constant α > 0 and constant β > 0. In other words, achieving any constant‑factor multiplicative approximation is computationally infeasible under DP.

  2. Hardness in the Euclidean (ℓ2) metric: For the Euclidean norm, the same impossibility holds for approximation factors better than 1 ± Θ(1/d²), where d is the ambient dimension. Thus, as the dimension grows, the allowable multiplicative error must increase polynomially; achieving a constant‑factor approximation is ruled out.

The technical core of the proof is a reduction from the hardness of privately generating synthetic data for 3‑literal disjunction queries, a problem studied by Ullman and Vadhan (2020). Their result shows that, assuming one‑way functions, no polynomial‑time (ε, 1/n^{ω(1)})‑DP “promise sanitizer” can accurately answer all 3‑disjunction queries that evaluate to 1 on the input dataset. The authors embed the same dataset used in the sanitization problem into a k‑means instance: each 3‑literal clause (i₁,i₂,i₃ with signs s₁,s₂,s₃) is represented by three centers, each having a single non‑zero coordinate equal to 2·s_j at the position i_j, with all other coordinates zero. For a point x ∈ {±1}^d, the ℓ∞ distance to these centers is small (zero) exactly when the clause is satisfied, and large (at least 3) otherwise. Consequently, the k‑means cost of a dataset encodes the truth values of all clauses.

If a DP algorithm could produce a coreset that preserves k‑means costs within (1 ± α) multiplicative error, one could round each coreset point to its sign vector, thereby reconstructing a synthetic dataset that correctly answers all “satisfiable” 3‑disjunction queries. This would contradict the hardness of the promise sanitization problem. The reduction works directly for ℓ∞, where the gap between satisfied and unsatisfied clauses is a constant (3). For ℓ2, the gap shrinks to 1 + Θ(1/d), which explains why the hardness only applies when α is on the order of 1/d².

The paper also clarifies that the coreset size restriction is omitted: without DP constraints, outputting the entire input as a coreset trivially satisfies the definition, emphasizing that the computational barrier is unique to the privacy requirement. The authors discuss related work on DP lower bounds, noting that prior results either relied on artificially constructed query families or on discrete CSPs; their contribution extends hardness to natural continuous‑domain queries (k‑means costs).

In conclusion, the work shows that the exponential dependence on k or d observed in existing DP coreset algorithms is not merely a technical artifact but reflects a fundamental computational hardness under widely believed cryptographic assumptions. This establishes a clear separation between information‑theoretic feasibility (coresets exist) and algorithmic tractability in the DP setting, and suggests that future research must either relax privacy parameters, restrict the query class, or rely on stronger cryptographic assumptions to obtain efficient DP coreset constructions.


Comments & Academic Discussion

Loading comments...

Leave a Comment