A Note on the Inapproximability of Correlation Clustering

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We consider inapproximability of the correlation clustering problem defined as follows: Given a graph $G = (V,E)$ where each edge is labeled either “+” (similar) or “-” (dissimilar), correlation clustering seeks to partition the vertices into clusters so that the number of pairs correctly (resp. incorrectly) classified with respect to the labels is maximized (resp. minimized). The two complementary problems are called MaxAgree and MinDisagree, respectively, and have been studied on complete graphs, where every edge is labeled, and general graphs, where some edge might not have been labeled. Natural edge-weighted versions of both problems have been studied as well. Let S-MaxAgree denote the weighted problem where all weights are taken from set S, we show that S-MaxAgree with weights bounded by $O(|V|^{1/2-\delta})$ essentially belongs to the same hardness class in the following sense: if there is a polynomial time algorithm that approximates S-MaxAgree within a factor of $\lambda = O(\log{|V|})$ with high probability, then for any choice of S’, S’-MaxAgree can be approximated in polynomial time within a factor of $(\lambda + \epsilon)$, where $\epsilon > 0$ can be arbitrarily small, with high probability. A similar statement also holds for $S-MinDisagree. This result implies it is hard (assuming $NP \neq RP$) to approximate unweighted MaxAgree within a factor of $80/79-\epsilon$, improving upon a previous known factor of $116/115-\epsilon$ by Charikar et. al. \cite{Chari05}.

💡 Research Summary

Correlation clustering is a fundamental combinatorial optimization problem in which each edge of a graph is labeled either “similar” (+) or “dissimilar” (–). The goal is to partition the vertex set into clusters so that the number of correctly classified pairs is maximized (MaxAgree) or, equivalently, the number of incorrectly classified pairs is minimized (MinDisagree). While the problem has been extensively studied on complete graphs (where every edge carries a label) and on general graphs (where some edges may be unlabeled), the approximation hardness of its weighted variants has remained only partially understood.

The present paper makes four major contributions. First, it introduces a unified reduction that works for any weight set S whose elements are bounded by O(|V|^{1/2 – δ}) for some constant δ > 0. The reduction consists of two steps: (i) scaling each “+” edge by a large integer N and each “–” edge by –N, and (ii) replacing every original vertex v by a bundle of N clones. All edges among the clones receive a weight of order N², which forces any near‑optimal clustering to keep the clones together. This construction preserves the optimal objective value up to a linear factor and introduces only a negligible additive error.

Second, the authors prove that if there exists a polynomial‑time algorithm that approximates S‑MaxAgree within a factor λ = O(log |V|) with high probability, then for any other weight set S′ the same algorithm can be turned into a (λ + ε)‑approximation for S′‑MaxAgree, where ε > 0 can be made arbitrarily small. The proof relies on a careful probabilistic analysis (Chernoff bounds and Markov’s inequality) to show that the random choices made during the cloning step succeed with overwhelming probability, placing the overall procedure in the RP complexity class.

Third, an analogous statement is shown for the MinDisagree variant. By simply flipping the sign of the scaled weights, the same cloning construction yields a reduction that preserves the minimum‑disagreement objective, leading to the same (λ + ε)‑approximation transfer result for any weight set.

Finally, the paper leverages the above general reduction to improve the known hardness of approximation for the unweighted MaxAgree problem. Using the reduction, the authors show that achieving a factor better than 80/79 – ε would imply a polynomial‑time RP algorithm for an NP‑hard problem, contradicting the widely believed separation NP ≠ RP. This improves upon the previous best lower bound of 116/115 – ε established by Charikar et al. (2005).

In summary, the work demonstrates that the approximation difficulty of correlation clustering is essentially independent of the specific weight magnitudes, as long as those magnitudes stay below a sub‑square‑root threshold. The result tightens the hardness landscape for both MaxAgree and MinDisagree, establishes a clean “hardness transfer” theorem across weight families, and raises new questions about whether even stronger constants (or constant‑factor approximations) can be ruled out for special graph classes or under stronger complexity assumptions. The techniques introduced—particularly the high‑weight cloning gadget—are likely to be useful in future reductions involving other clustering or partitioning problems with weighted constraints.

A Note on the Inapproximability of Correlation Clustering

💡 Research Summary

Comments & Academic Discussion

Leave a Comment