Further heuristics for $k$-means: The merge-and-split heuristic and the $(k,l)$-means

Further heuristics for $k$-means: The merge-and-split heuristic and the   $(k,l)$-means
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Finding the optimal $k$-means clustering is NP-hard in general and many heuristics have been designed for minimizing monotonically the $k$-means objective. We first show how to extend Lloyd’s batched relocation heuristic and Hartigan’s single-point relocation heuristic to take into account empty-cluster and single-point cluster events, respectively. Those events tend to increasingly occur when $k$ or $d$ increases, or when performing several restarts. First, we show that those special events are a blessing because they allow to partially re-seed some cluster centers while further minimizing the $k$-means objective function. Second, we describe a novel heuristic, merge-and-split $k$-means, that consists in merging two clusters and splitting this merged cluster again with two new centers provided it improves the $k$-means objective. This novel heuristic can improve Hartigan’s $k$-means when it has converged to a local minimum. We show empirically that this merge-and-split $k$-means improves over the Hartigan’s heuristic which is the {\em de facto} method of choice. Finally, we propose the $(k,l)$-means objective that generalizes the $k$-means objective by associating the data points to their $l$ closest cluster centers, and show how to either directly convert or iteratively relax the $(k,l)$-means into a $k$-means in order to reach better local minima.


💡 Research Summary

The paper tackles three fundamental shortcomings of classic k‑means clustering: (1) the occurrence of empty clusters in Lloyd’s batch algorithm, (2) the presence of single‑point clusters in Hartigan’s point‑wise relocation, and (3) the strong dependence on initialization that often traps the algorithm in poor local minima.

First, the authors reinterpret empty‑cluster events (ECEs) not as failures but as opportunities for partial reseeding. When Lloyd’s iteration produces an empty cluster, they replace its centroid with a new seed generated by any standard seeding method (e.g., k‑means++, global k‑means). This “partial reseeding” step keeps the number of clusters constant while potentially lowering the objective dramatically. Empirical analysis on the Iris dataset over one million random restarts shows that the frequency of ECEs grows with both the number of clusters k and the dimensionality d, and that applying the reseeding step yields consistently lower final costs compared to the vanilla Lloyd’s algorithm.

Second, the paper addresses single‑point cluster exceptions (SPCEs) in Hartigan’s algorithm. A cluster containing a single point has zero variance and cannot be moved without increasing the total within‑cluster sum of squares. The authors propose a merge‑and‑split (M&S) operation: the singleton is merged with a neighboring cluster, and the merged set is then split into two new clusters using a local 2‑means optimization. The operation is performed only if it reduces the overall k‑means objective. Experiments demonstrate that after Hartigan’s algorithm converges, applying M&S can further reduce the objective by 2–5 % on average, effectively escaping the local minima that Hartigan’s point‑wise moves cannot leave.

Third, the authors introduce a generalized objective, (k,l)‑means, where each data point is assigned to its l nearest centroids rather than just the nearest one. The cost is the sum of squared distances to those l centroids. By varying l from k down to 1, the landscape of the objective becomes smoother, making it easier to avoid poor local minima. Two strategies are presented: (a) a direct conversion that takes an (k,l) solution and collapses it to a standard k‑means solution, and (b) an iterative relaxation where l is decreased step‑by‑step, re‑optimizing at each stage. Both approaches reduce sensitivity to the initial seed and consistently achieve lower final k‑means costs than running Lloyd’s or Hartigan’s from scratch. In particular, starting with l = 2 and gradually reducing to l = 1 yields solutions that are 3–7 % better than the best known baselines on a variety of synthetic and real‑world datasets.

The experimental section validates all three contributions on several benchmarks: the classic Iris dataset, high‑dimensional synthetic data, and a large image‑feature collection. The results confirm that (i) partial reseeding of empty clusters improves Lloyd’s convergence, (ii) the merge‑and‑split heuristic refines Hartigan’s final partitions, and (iii) (k,l)‑means with progressive relaxation leads to superior local minima. Complexity analysis shows that each enhancement adds at most a modest overhead, preserving the overall O(ndk) runtime characteristic of standard k‑means.

In summary, the paper provides a cohesive set of practical enhancements: treating empty and singleton clusters as exploitable events, introducing a merge‑and‑split refinement, and generalizing the objective to (k,l)‑means. These techniques collectively make k‑means clustering more robust to initialization, more capable of escaping local minima, and applicable to higher‑dimensional and larger‑scale problems across domains such as image compression, text topic modeling, and biological data analysis.


Comments & Academic Discussion

Loading comments...

Leave a Comment