Clustering under Constraints: Efficient Parameterized Approximation Schemes
We present a unified framework that yields EPASes for constrained $(k,z)$-clustering in metric spaces of bounded (algorithmic) scatter dimension, a notion introduced by Abbasi et al. (FOCS 2023). They showed that several well known metric families, including continuous Euclidean spaces, bounded doubling spaces, planar metrics, and bounded treewidth metrics, have bounded scatter dimension. Subsequently, Bourneuf and Pilipczuk (SODA 2025) proved that this also holds for metrics induced by graphs from any fixed proper minor closed class. Our result, in particular, addresses a major open question of Abbasi et al., whose approach to $k$-clustering in such metrics was inherently limited to \emph{Voronoi-based} objectives, where each point is connected only to its nearest chosen center. As a consequence, we obtain EPASes for several constrained clustering problems, including capacitated and matroid $(k,z)$-clustering, fault tolerant and fair $(k,z)$-clustering, as well as for metrics of bounded highway dimension. In particular, our results on capacitated and fair $k$-Median and $k$-Means provide the first EPASes for these problems across broad families of structured metrics. Previously, such results were known only in continuous Euclidean spaces, due to the works of Cohen-Addad and Li (ICALP 2019) and Bandyapadhyay, Fomin, and Simonov (ICALP 2021; JCSS 2024), respectively. Along the way, we also obtain faster EPASes for uncapacitated $k$-Median and $k$-Means, improving upon the running time of the algorithm by Abbasi et al. (FOCS 2023).
💡 Research Summary
The paper introduces a unified algorithmic framework that yields Efficient Parameterized Approximation Schemes (EPAS) for a broad class of constrained (k, z)-clustering problems in metric spaces whose (algorithmic) scatter dimension is bounded. The notion of scatter dimension, originally defined by Abbasi et al. (FOCS 2023), measures the length of the longest ε‑scattering sequence—a chain of center–point pairs with specific distance relationships. While the original definition was purely combinatorial, this work distinguishes the “algorithmic scatter dimension,” i.e., the longest sequence that can be produced by a concrete ball‑intersection algorithm. This refinement is essential for designing algorithms that exploit the structural limitation of the metric.
The framework rests on three key components:
-
Coreset Construction (Algorithm A_C). For any instance, a weighted coreset of size O(k·log n·ε⁻¹) is built, preserving both the objective value and all assignment constraints (capacities, fairness ratios, matroid independence, fault‑tolerance, outliers, etc.). The coreset reduces the problem size while guaranteeing that any feasible solution on the coreset can be lifted to a near‑optimal solution on the original data.
-
Ball‑Intersection Procedure (Algorithm A_B). Using the bounded scatter dimension, A_B efficiently enumerates a small family of candidate center sets. It exploits the fact that in a space with constant scatter dimension, the number of distinct “relevant” balls intersecting a given region is limited, allowing a combinatorial explosion to be avoided.
-
Assignment Optimizer (Algorithm A_A). Given a candidate set of k centers and a weighted point set, A_A computes a feasible assignment that respects all imposed constraints. Depending on the constraint type, this step reduces to a linear program, a matroid intersection, or a simple greedy allocation, all solvable in polynomial time for the coreset size.
The framework is generic: any constrained (k, z)-clustering problem that satisfies the existence of A_C, A_B, and A_A falls under its scope. The authors verify these conditions for several important constraint families:
- Capacitated clustering – each facility has a capacity η; fractional assignments are allowed.
- Fair clustering – each color class must appear in each cluster within a prescribed interval
Comments & Academic Discussion
Loading comments...
Leave a Comment