Robust Clustering Using Outlier-Sparsity Regularization
Notwithstanding the popularity of conventional clustering algorithms such as K-means and probabilistic clustering, their clustering results are sensitive to the presence of outliers in the data. Even a few outliers can compromise the ability of these algorithms to identify meaningful hidden structures rendering their outcome unreliable. This paper develops robust clustering algorithms that not only aim to cluster the data, but also to identify the outliers. The novel approaches rely on the infrequent presence of outliers in the data which translates to sparsity in a judiciously chosen domain. Capitalizing on the sparsity in the outlier domain, outlier-aware robust K-means and probabilistic clustering approaches are proposed. Their novelty lies on identifying outliers while effecting sparsity in the outlier domain through carefully chosen regularization. A block coordinate descent approach is developed to obtain iterative algorithms with convergence guarantees and small excess computational complexity with respect to their non-robust counterparts. Kernelized versions of the robust clustering algorithms are also developed to efficiently handle high-dimensional data, identify nonlinearly separable clusters, or even cluster objects that are not represented by vectors. Numerical tests on both synthetic and real datasets validate the performance and applicability of the novel algorithms.
💡 Research Summary
The paper addresses a fundamental weakness of popular clustering methods—namely, their extreme sensitivity to a small number of outliers. Both K‑means and Gaussian mixture model (GMM) clustering rely on Euclidean distances or likelihoods that are heavily influenced by large residuals, so even a few anomalous points can dramatically shift centroids and degrade cluster assignments. To overcome this, the authors introduce an explicit outlier vector oₙ for each datum, augmenting the standard data model to xₙ = Σ₍c₎ uₙc m_c + oₙ + vₙ, where uₙc denotes the (hard or soft) cluster membership, m_c the cluster center, vₙ Gaussian noise, and oₙ is non‑zero only for outliers. The key observation is that most oₙ are zero, which translates the rarity of outliers into sparsity in the outlier domain.
Directly minimizing the ℓ₀‑norm of the outlier matrix is NP‑hard, so the authors replace it with an ℓ₁‑norm regularizer λ∑ₙ‖oₙ‖₂. This yields the objective
J(M,O,U) = Σₙ Σ_c uₙc‖xₙ – m_c – oₙ‖₂² + λ Σₙ‖oₙ‖₂,
where λ controls the trade‑off between fitting the data and penalizing non‑zero outlier vectors. When λ → ∞ the outlier term vanishes and the formulation collapses to standard K‑means; when λ = 0 every point is declared an outlier.
Because the problem is convex in (M,O) but jointly non‑convex in (M,O,U), the authors adopt a block coordinate descent (BCD) scheme. Three sub‑problems are solved iteratively: (i) with M and O fixed, update the assignment matrix U (hard 0‑1 constraints for K‑means, or soft constraints for fuzzy K‑means and GMM); (ii) with U fixed, update centroids M by simple weighted averages after subtracting the current outlier estimates; (iii) with M and U fixed, update each outlier vector oₙ via a group‑Lasso step, which admits a closed‑form soft‑thresholding solution. The group‑Lasso formulation ensures that either the whole vector oₙ is set to zero (non‑outlier) or it is shrunk proportionally to the residual, thereby automatically detecting outliers.
For the probabilistic case, the authors embed the same outlier model into a GMM with a common covariance Σ. The regularized negative log‑likelihood becomes
−L(Θ) + λ Σₙ‖oₙ‖_{Σ⁻¹},
and an EM‑style algorithm is derived: the E‑step computes posterior responsibilities γₙc, while the M‑step updates means, the shared covariance, and outlier vectors using the same ℓ₁‑penalized least‑squares update. The common covariance prevents unbounded likelihoods that would otherwise arise from letting a single component collapse onto an outlier.
Computationally, each BCD/EM iteration requires O(N C p) operations—the same order as vanilla K‑means or EM—plus a cheap ℓ₂‑norm thresholding per datum. Hence the robust algorithms incur negligible overhead.
The authors further kernelize the approach. By mapping data to a high‑dimensional feature space Φ(·) and replacing inner products with a kernel K(xᵢ,xⱼ), the same updates can be performed without explicit feature vectors. Outlier vectors are treated identically in feature space, and the group‑Lasso step becomes a kernel‑matrix operation. This kernel extension enables clustering of non‑linearly separable data, images, graphs, or any objects for which a kernel can be defined.
Extensive experiments on synthetic 2‑D data, MNIST handwritten digits (with added random pixel outliers), and social network adjacency matrices (with injected fake nodes) demonstrate the benefits. With as few as 5 % outliers, the robust K‑means and robust GMM achieve 10–30 % higher clustering accuracy, lower within‑cluster variance, and superior outlier detection rates compared to standard methods. The kernelized versions successfully recover non‑linear cluster structures while maintaining robustness.
In summary, the paper leverages sparsity‑driven regularization—originally popular in compressive sensing—to create robust clustering algorithms that simultaneously estimate cluster parameters and identify outliers. The proposed methods are mathematically sound (convergence guarantees, complexity analysis) and practically efficient (closed‑form updates, kernel compatibility), representing a significant contribution to the field of unsupervised learning.
Comments & Academic Discussion
Loading comments...
Leave a Comment