Constant Factor Approximation for Balanced Cut in the PIE model
We propose and study a new semi-random semi-adversarial model for Balanced Cut, a planted model with permutation-invariant random edges (PIE). Our model is much more general than planted models considered previously. Consider a set of vertices V partitioned into two clusters $L$ and $R$ of equal size. Let $G$ be an arbitrary graph on $V$ with no edges between $L$ and $R$. Let $E_{random}$ be a set of edges sampled from an arbitrary permutation-invariant distribution (a distribution that is invariant under permutation of vertices in $L$ and in $R$). Then we say that $G + E_{random}$ is a graph with permutation-invariant random edges. We present an approximation algorithm for the Balanced Cut problem that finds a balanced cut of cost $O(|E_{random}|) + n \text{polylog}(n)$ in this model. In the regime when $|E_{random}| = \Omega(n \text{polylog}(n))$, this is a constant factor approximation with respect to the cost of the planted cut.
💡 Research Summary
The paper introduces a novel semi‑random, semi‑adversarial model for the Balanced Cut problem, called the PIE (Permutation‑Invariant Random Edges) model. In this model the vertex set V is split into two equal‑size clusters L and R. An adversary may choose any graph G on V that contains no edges crossing the planted cut (L,R). A second adversary independently chooses an arbitrary graph H on a fresh copy of the vertex set. Nature then draws a random bijection π that maps the L‑side of H onto L and the R‑side onto R, uniformly among all such bijections. The final graph is F = G ⊞ πH, i.e., the union of the edges of G and the permuted edges of H. The distribution of the random edges E_random = π(E_H) is required only to be invariant under permutations that preserve the two sides; no independence, uniformity, or specific probability parameters are assumed. This captures a wide range of realistic scenarios, such as noisy similarity graphs, social networks with multiple tie types, or preferential‑attachment structures, where the “random” part may be highly dependent and structurally complex.
The main technical result (Theorem 1.4) states that there exists a deterministic polynomial‑time algorithm which, given a graph sampled from the PIE model, outputs a Θ(1)‑balanced cut (S,T) whose edge‑cut size satisfies
|E(S,T)| = O(|E_random|) + O(n·polylog n)
with high probability over the random bijection. Consequently, when the number of random edges satisfies |E_random| = Ω(n·polylog n), the algorithm achieves a constant‑factor approximation with respect to the size of the planted cut. Importantly, the algorithm does not know G, the permutation‑invariant distribution D, nor the planted partition (L,R).
The algorithmic framework builds on an SDP relaxation similar to that of Arora‑Rao‑Vazirani but with a sphere constraint of radius √2/2. The SDP assigns a vector φ(u) to each vertex u. An edge is called δ‑short if the squared Euclidean distance between its endpoint vectors is at most δ, otherwise it is δ‑long. Intuitively, because the random edges are permuted uniformly, the probability that a random edge becomes short is tiny; most random edges are long and thus contribute a noticeable amount to the SDP objective. However, the SDP solution does depend on the random edges, and the vectors are not uniformly spread on the sphere, so a naïve counting argument does not suffice.
To overcome these obstacles the authors introduce two key procedures:
-
Heavy Vertices Removal – The algorithm repeatedly finds balls of radius δ on the sphere that contain many vertices (heavy balls). Cutting off all vertices inside such a ball removes a constant fraction of the total SDP contribution while incurring only a constant factor more cuts in the original graph. By applying this step iteratively, the algorithm eliminates a large portion of the random edges while keeping the total number of removed edges proportional to |E_random|.
-
Damage Control – The SDP solution defines a “skeleton” consisting of short edges of G. The skeleton may not cover the whole graph, and after several iterations some vertices may fall outside it. Damage Control selectively removes vertices not covered by the skeleton, ensuring that the remaining graph still admits a good SDP solution for the next iteration. This step guarantees that the algorithm can continue to make progress even when the skeleton shrinks.
The overall algorithm proceeds in O(log n) phases. In each phase it solves the SDP, classifies edges as short or long, applies Heavy Vertices Removal to cut off dense spherical regions, and then uses Damage Control to prune uncovered vertices. The number of long random edges removed in a phase is at most |E_random|/δ, because each long edge contributes at least δ to the SDP objective, which is bounded by the cost of the planted cut. Summing over all phases yields a total cut size of O(|E_random|) plus an additive term O(n·log³ n) that arises from the SDP value and the overhead of the removal procedures.
The paper compares the PIE model with two well‑studied models:
- Random planted (Stochastic Block) model – Both clusters are random G(n/2, p) graphs and cross‑cluster edges appear independently with probability q < p. In this setting the planted cut can be recovered exactly, but the model assumes strong independence and uniformity.
- Semi‑random model – The intra‑cluster graphs may be arbitrary, but the cross‑cluster edges are still generated independently and an adversary may delete some of them. Recovery of the planted cut is still impossible in general, but algorithms can achieve O(|E_random|) guarantees.
The PIE model strictly generalizes both: intra‑cluster graphs are arbitrary, and the cross‑cluster edges come from any permutation‑invariant distribution, allowing dependencies, large bicliques, and complex structures. Consequently, exact recovery of the planted cut is information‑theoretically impossible, yet the authors show that a constant‑factor approximation is achievable under mild density conditions on the random edges.
The authors also discuss practical motivations. In clustering with noisy similarity measurements, the true intra‑cluster similarities form a structured graph (E_G) while measurement errors produce noisy cross‑cluster edges (E_random) that are plausibly permutation‑invariant. In social networks, local ties (friends, family) correspond to E_G, whereas global ties (co‑authorship, online follows) correspond to E_random. The PIE model captures the independence of these tie types and the lack of label information, making the algorithm applicable to real‑world network partitioning tasks.
Finally, the paper outlines future directions: extending the techniques to other graph partitioning problems (e.g., sparsest cut, multi‑way cuts), tightening the hidden constants and logarithmic factors, and exploring whether similar semi‑adversarial models can yield stronger guarantees for other combinatorial optimization problems. The work demonstrates that by carefully blending SDP relaxations with geometric removal procedures, one can obtain robust approximation guarantees even in highly adversarial, semi‑random environments.
Comments & Academic Discussion
Loading comments...
Leave a Comment