k-Means has Polynomial Smoothed Complexity

k-Means has Polynomial Smoothed Complexity
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The k-means method is one of the most widely used clustering algorithms, drawing its popularity from its speed in practice. Recently, however, it was shown to have exponential worst-case running time. In order to close the gap between practical performance and theoretical analysis, the k-means method has been studied in the model of smoothed analysis. But even the smoothed analyses so far are unsatisfactory as the bounds are still super-polynomial in the number n of data points. In this paper, we settle the smoothed running time of the k-means method. We show that the smoothed number of iterations is bounded by a polynomial in n and 1/\sigma, where \sigma is the standard deviation of the Gaussian perturbations. This means that if an arbitrary input data set is randomly perturbed, then the k-means method will run in expected polynomial time on that input set.


💡 Research Summary

The paper addresses a long‑standing gap between the practical efficiency of the k‑means clustering algorithm and its theoretical worst‑case behavior. While it is known that k‑means can require an exponential number of iterations on pathological inputs, empirical evidence shows that it typically converges in a handful of steps. Smoothed analysis—a framework that studies algorithmic performance on inputs perturbed by small random noise—has been proposed to reconcile this discrepancy. However, prior smoothed analyses of k‑means have only yielded super‑polynomial bounds (e.g., n·2^{O(√log n)}), which still fall far short of explaining the observed near‑linear or logarithmic iteration counts.

In this work the authors prove that the expected number of iterations of k‑means on a σ‑Gaussian‑perturbed input is bounded by a polynomial in the number of data points n and the inverse of the perturbation magnitude 1/σ. Formally, they show that there exist constants c₁ and c₂ such that
 E


Comments & Academic Discussion

Loading comments...

Leave a Comment