From Logits to Hierarchies: Hierarchical Clustering made Simple

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The hierarchical structure inherent in many real-world datasets makes the modeling of such hierarchies a crucial objective in both unsupervised and supervised machine learning. While recent advancements have introduced deep architectures specifically designed for hierarchical clustering, we adopt a critical perspective on this line of research. Our findings reveal that these methods face significant limitations in scalability and performance when applied to realistic datasets. Given these findings, we present an alternative approach and introduce a lightweight method that builds on pre-trained non-hierarchical clustering models. Remarkably, our approach outperforms specialized deep models for hierarchical clustering, and it is broadly applicable to any pre-trained clustering model that outputs logits, without requiring any fine-tuning. To highlight the generality of our approach, we extend its application to a supervised setting, demonstrating its ability to recover meaningful hierarchies from a pre-trained ImageNet classifier. Our results establish a practical and effective alternative to existing deep hierarchical clustering methods, with significant advantages in efficiency, scalability and performance.

💡 Research Summary

**
This paper critically examines recent deep learning approaches to hierarchical clustering and demonstrates that, despite their sophisticated architectures, they struggle with scalability and leaf‑level performance on realistic, large‑scale datasets. The authors first evaluate several state‑of‑the‑art specialized models—including DeepECT, TreeVAE, and other deep hierarchical clustering frameworks—on vision benchmarks such as CIFAR‑10, CIFAR‑100, and Food‑101. Their experiments reveal that these methods require substantial computational resources (often needing GPUs for many hours) and that their clustering quality at the finest granularity is frequently inferior to that of simple flat clustering baselines.

Motivated by these shortcomings, the authors propose a lightweight algorithm called Logits to Hierarchies (L2H). The key insight is that a pre‑trained flat clustering model already encodes rich relational information in its output logits. By operating solely on these logits—without accessing internal embeddings or computing pairwise distances—L2H can infer a hierarchy of clusters with minimal overhead.

The method works as follows. Given a K‑cluster model fθ that outputs unnormalized logits for each data point, the algorithm defines two functions: hθ(x), the arg‑max cluster assignment, and gθ(x), the corresponding softmax probability. For any subset of clusters G, a masked softmax m‑softmax restricts the probability mass to the complement of G, enabling the computation of reassignment probabilities when G is “removed”. The algorithm iteratively (K‑1) times selects a group G* with the lowest aggregated confidence score s(G)=∑_{x∈DG} gθ(x). It then evaluates, for all points originally assigned to G*, how likely they would be reassigned to each remaining cluster c using the masked softmax, accumulating these probabilities into rp(c). By averaging rp(c) over each candidate group, the algorithm identifies the most related group G† and merges G* with G†, recording the merge in a hierarchy tree.

Because all operations are simple reductions over logits, L2H runs efficiently on a single CPU. The authors provide a Python implementation and report that hierarchical clustering of an ImageNet‑scale dataset (≈1 M images, 1000 classes) completes in under a minute, a stark contrast to the multi‑hour runtimes of deep hierarchical models.

Empirical results show that L2H consistently outperforms the deep baselines across several metrics: (1) Dasgupta cost, measuring overall hierarchical quality, is reduced by 10–20 %; (2) leaf‑level clustering accuracy matches or exceeds that of flat K‑means; (3) runtime and memory consumption are dramatically lower.

The paper also extends L2H to a supervised setting by applying it to the logits of a pre‑trained ImageNet classifier. The derived hierarchy recovers large portions of the WordNet taxonomy, grouping semantically related classes (e.g., dogs and cats) together. Moreover, the hierarchy highlights spurious correlations—such as unusually high reassignment probabilities between unrelated classes—offering a new lens for model interpretability and bias detection.

In summary, the contributions are threefold: (i) a thorough empirical critique of existing deep hierarchical clustering methods, exposing their scalability bottlenecks; (ii) the introduction of L2H, a simple yet powerful logit‑based hierarchy construction algorithm that requires no fine‑tuning and works with black‑box models; and (iii) a demonstration that L2H can be used for both unsupervised clustering and supervised model analysis, providing insights into class relationships and potential biases. The authors suggest future work on alternative masking strategies, extensions to non‑visual domains, and integration of hierarchical structures into downstream tasks such as data augmentation or generative modeling.

From Logits to Hierarchies: Hierarchical Clustering made Simple

💡 Research Summary

Comments & Academic Discussion

Leave a Comment