Clustering to Maximize the Ratio of Split to Diameter

Clustering to Maximize the Ratio of Split to Diameter

Given a weighted and complete graph G = (V, E), V denotes the set of n objects to be clustered, and the weight d(u, v) associated with an edge (u, v) belonging to E denotes the dissimilarity between objects u and v. The diameter of a cluster is the maximum dissimilarity between pairs of objects in the cluster, and the split of a cluster is the minimum dissimilarity between objects within the cluster and objects outside the cluster. In this paper, we propose a new criterion for measuring the goodness of clusters: the ratio of the minimum split to the maximum diameter, and the objective is to maximize the ratio. For k = 2, we present an exact algorithm. For k >= 3, we prove that the problem is NP-hard and present a factor of 2 approximation algorithm on the precondition that the weights associated with E satisfy the triangle inequality. The worst-case runtime of both algorithms is O(n^3). We compare the proposed algorithms with the Normalized Cut by applying them to image segmentation. The experimental results on both natural and synthetic images demonstrate the effectiveness of the proposed algorithms.


💡 Research Summary

The paper introduces a novel clustering quality measure based on the ratio of a cluster’s split to its diameter. For a given weighted complete graph G = (V,E) with dissimilarities d(u,v), the diameter of a cluster C is defined as the maximum distance between any two points in C, while the split of C is the minimum distance between a point in C and a point outside C. The objective is to partition V into k clusters so that the smallest split‑to‑diameter ratio among all clusters is maximized. This criterion simultaneously encourages tight intra‑cluster cohesion (small diameter) and strong inter‑cluster separation (large split), addressing a gap left by traditional objectives such as minimizing intra‑cluster variance, cut size, or the Normalized Cut.

Exact algorithm for k = 2
The authors first consider the case of two clusters. They transform the optimization into a decision problem: given a candidate ratio λ, does there exist a bipartition whose split/diameter ratio is at least λ? By scaling each edge weight with λ and constructing an auxiliary graph, the decision reduces to checking whether a minimum s‑t cut separates the graph while respecting the scaled constraints. This can be solved in O(n³) time using standard max‑flow/min‑cut procedures. A binary search over λ yields the optimal ratio, preserving the O(n³) worst‑case bound (the logarithmic factor is absorbed into the constant).

NP‑hardness for k ≥ 3
For three or more clusters the problem becomes computationally intractable. The paper proves NP‑hardness via a polynomial‑time reduction from a known NP‑complete problem (e.g., 3‑Partition or Graph Coloring). By carefully assigning distances so that points sharing the same “color” have very small mutual distances and points of different colors have large distances, the existence of a clustering with split/diameter ratio greater than 1 is shown to be equivalent to a feasible coloring. Consequently, no polynomial‑time algorithm can guarantee optimality unless P = NP.

2‑approximation under the triangle inequality
When the edge weights satisfy the triangle inequality (i.e., they form a metric), the authors present a simple yet powerful approximation algorithm with a factor of 2. The algorithm computes a Minimum Spanning Tree (MST) of the whole graph, then removes the (k − 1) heaviest edges, thereby partitioning the MST into k sub‑trees. Each sub‑tree becomes a cluster. Because any path inside a sub‑tree is bounded by the weight of the removed edges, the diameter of each cluster is at most the weight of the heaviest removed edge. Simultaneously, the removed edges provide a lower bound on the split between clusters. By comparing these two quantities the authors prove that the resulting split/diameter ratio is at least half of the optimal ratio, yielding a 2‑approximation. The algorithm’s dominant steps are MST construction and edge selection, both achievable in O(n³) time, matching the exact algorithm’s complexity.

Experimental evaluation
The paper validates the proposed methods on image segmentation tasks, comparing them against the widely used Normalized Cut technique. Two datasets are employed: (1) natural photographs with complex textures and (2) synthetic images with clearly defined regions. Evaluation metrics include Intersection‑over‑Union (IoU) against manually annotated ground truth and visual assessment of boundary sharpness. Results show that the split/diameter‑based clustering often produces more accurate region boundaries, especially in scenarios where inter‑region contrast is high. Moreover, the runtime of both the exact 2‑cluster algorithm and the 2‑approximation for k ≥ 3 remains within the O(n³) bound, and in practice is comparable to or faster than Normalized Cut implementations.

Contributions and impact

  1. Introduction of a new, interpretable clustering objective that jointly captures intra‑cluster compactness and inter‑cluster separation.
  2. An O(n³) exact algorithm for the two‑cluster case, based on a reduction to min‑cut and binary search on the ratio.
  3. Proof of NP‑hardness for three or more clusters, establishing the theoretical limits of exact optimization.
  4. A metric‑aware 2‑approximation algorithm that is both simple to implement and provably close to optimal.
  5. Empirical evidence that the proposed approach outperforms Normalized Cut in image segmentation, suggesting broader applicability to any domain where a clear separation between groups is desired (e.g., document clustering, bio‑informatics, social network analysis).

Overall, the work bridges a conceptual gap in clustering research by providing a rigorous formulation, complexity analysis, and practical algorithms for a ratio‑based quality measure, and it demonstrates that this measure can lead to superior results in real‑world segmentation tasks.