Improving k-Means Clustering Performance with Disentangled Internal Representations

Improving k-Means Clustering Performance with Disentangled Internal Representations
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

💡 Research Summary

The paper proposes a simple yet effective method to improve k‑means clustering by learning disentangled latent representations through an autoencoder regularized with a soft nearest neighbor loss (SNNL). Traditional deep clustering approaches such as Deep Embedded Clustering (DEC), Variational Deep Embedding (VaDE), and ClusterGAN jointly train an encoder and a separate clustering network, often requiring complex loss functions and adversarial objectives. In contrast, the authors use a single fully‑connected autoencoder to learn a compact representation and augment the standard reconstruction loss with a term that explicitly minimizes “entanglement” – the degree to which points from different classes are intermixed in the latent space.

Entanglement is quantified using the soft nearest neighbor loss, originally derived from a temperature‑scaled version of Neighborhood Component Analysis. The loss computes, for each sample, the probability that a randomly chosen neighbor belongs to the same class, with distances weighted by a temperature parameter T. A low temperature emphasizes close neighbors, while a high temperature gives weight to distant points. The authors extend this idea by introducing an annealing schedule for T: (T = 1/(\eta + i)^{\gamma}), where i is the current epoch, η = 1, and γ = 0.55. This schedule causes the loss to focus on fine‑grained local structure in later epochs, encouraging early separation of classes.

The autoencoder architecture mirrors classic deep autoencoders: input dimension d → 500 → 500 → 2000 → c → 2000 → 500 → 500 → d reconstruction, with ReLU activations in hidden layers and sigmoid in the latent and output layers. The latent dimensionality c is fixed at 70 for all experiments, providing enough capacity to capture class structure while remaining low‑dimensional for clustering. Reconstruction loss is binary cross‑entropy (since inputs are normalized to


Comments & Academic Discussion

Loading comments...

Leave a Comment