Vector Quantized Latent Concepts: A Scalable Alternative to Clustering-Based Concept Discovery

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Deep Learning models encode rich semantic information in their hidden representations. However, it remains challenging to understand which parts of this information models actually rely on when making predictions. A promising line of post-hoc concept-based explanation methods relies on clustering token representations. However, commonly used approaches such as hierarchical clustering are computationally infeasible for large-scale datasets, and K-Means often yields shallow or frequency-dominated clusters. We propose the vector quantized latent concept (VQLC) method, a framework built upon the vector quantized-variational autoencoder (VQ-VAE) architecture that learns a discrete codebook mapping continuous representations to concept vectors. We perform thorough evaluations and show that VQLC improves scalability while maintaining comparable quality of human-understandable explanations.

💡 Research Summary

The paper addresses the problem of interpreting deep neural networks by extracting human‑understandable “latent concepts” from hidden token representations. Existing post‑hoc methods typically rely on clustering these representations. Hierarchical clustering captures meaningful concepts but scales quadratically, making it impractical for large corpora, while K‑Means is computationally cheap but often yields shallow, frequency‑biased clusters that do not reflect the true geometry of the representation space.

To overcome these limitations, the authors propose Vector Quantized Latent Concepts (VQLC), a framework built on the Vector‑Quantized Variational Auto‑Encoder (VQ‑VAE). VQLC learns a discrete codebook of concept vectors and maps each token representation to its nearest codebook entry. The key idea is that each codebook vector serves as a latent concept; tokens assigned to the same vector collectively define that concept.

Architecture

Encoder – A lightweight, single‑layer encoder with a learnable residual mixing mechanism. The original token representation (h_\ell(w_i)) is linearly transformed, layer‑normalized, and then blended with the original via a learnable coefficient (\alpha\in(0,0.5)). This preserves semantic information while adapting the space for stable quantization.
Vector Quantizer – A learnable codebook of size (K) (set to 400 in experiments). The codebook is initialized with K‑Means centroids computed on a subset of training token embeddings, providing a structured starting point. During training, the codebook is updated with an Exponential Moving Average (EMA) scheme (decay (\lambda=0.999)), which yields smooth, stable updates compared to gradient‑based methods. Quantization uses cosine distance; during training a top‑k (k=5) sampling with temperature (\tau=1.0) prevents codebook collapse, while at inference the nearest codebook vector is selected deterministically.
Decoder – Reconstructs the original encoder outputs from quantized vectors. It first projects the quantized vectors to a lower dimension (d’) (linear down‑projection), processes them through a 4‑layer Transformer encoder to capture contextual dependencies, and finally projects back to the original dimension with a linear up‑projection.

Training Objective

The loss combines a reconstruction term (L_{rec}=||z_e - \hat{z}_e||2^2) and a commitment term (L{commit}=||z_e - \text{sg}

Vector Quantized Latent Concepts: A Scalable Alternative to Clustering-Based Concept Discovery

💡 Research Summary

Architecture

Training Objective

Comments & Academic Discussion

Leave a Comment