MetaCluster: Enabling Deep Compression of Kolmogorov-Arnold Network

MetaCluster: Enabling Deep Compression of Kolmogorov-Arnold Network
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Kolmogorov-Arnold Networks (KANs) replace scalar weights with per-edge vectors of basis coefficients, thereby increasing expressivity and accuracy while also resulting in a multiplicative increase in parameters and memory. We propose MetaCluster, a framework that makes KANs highly compressible without sacrificing accuracy. Specifically, a lightweight meta-learner, trained jointly with the KAN, maps low-dimensional embeddings to coefficient vectors, thereby shaping them to lie on a low-dimensional manifold that is amenable to clustering. We then run K-means in coefficient space and replace per-edge vectors with shared centroids. Afterwards, the meta-learner can be discarded, and a brief fine-tuning of the centroid codebook recovers any residual accuracy loss. The resulting model stores only a small codebook and per-edge indices, exploiting the vector nature of KAN parameters to amortize storage across multiple coefficients. On MNIST, CIFAR-10, and CIFAR-100, across standard KANs and ConvKANs using multiple basis functions, MetaCluster achieves a reduction of up to $80\times$ in parameter storage, with no loss in accuracy. Similarly, on high-dimensional equation modeling tasks, MetaCluster achieves a parameter reduction of $124.1\times$, without impacting performance. Code will be released upon publication.


💡 Research Summary

Kolmogorov‑Arnold Networks (KANs) replace each scalar weight in a neural network with a vector of basis‑function coefficients, dramatically increasing expressive power for tasks such as scientific equation modeling and, more recently, computer‑vision. This architectural advantage comes at the cost of a multiplicative increase in parameters: if a KAN uses |w| coefficients per edge, the total parameter count is |w| times that of a comparable MLP. Traditional weight‑sharing techniques—clustering scalar weights into a codebook and storing compact indices—work well for MLPs but break down for KANs because the coefficient vectors live in a high‑dimensional space where distances concentrate, making clustering ineffective.

MetaCluster addresses this problem with a three‑stage pipeline. First, a lightweight meta‑learner Mθ maps a low‑dimensional embedding z_i∈ℝ^{d_emb} to the full coefficient vector w_i∈ℝ^{|w|}. The meta‑learner is a two‑layer MLP (linear‑ReLU‑linear) trained jointly with the KAN on the primary task loss. By forcing all w_i to lie on a low‑dimensional manifold, the high‑dimensional vectors become highly clusterable. Visualizations (t‑SNE) show that with d_emb=1 or 2 the coefficients collapse onto a line or a sheet, whereas a baseline KAN without a meta‑learner produces a diffuse cloud.

Second, K‑means clustering is performed on the manifold‑shaped coefficient vectors. Because each centroid c_j is itself a |w|-dimensional vector, a single codebook entry stores |w| scalars, amortizing the storage cost across many edges. The total storage becomes n·log₂(k)·b bits for indices plus |w|·k·b bits for the codebook, where n is the number of edges, k the number of clusters, and b the bit‑width per scalar. The factor |w| in the denominator dramatically reduces the relative impact of the index term, yielding far higher compression ratios for KANs than for MLPs at the same k.

Third, after clustering, the meta‑learner and embeddings are discarded; only the codebook and per‑edge indices remain. A brief fine‑tuning phase (β ≪ α epochs) updates the centroids to recover any accuracy loss introduced by the quantization.

Experiments cover two families of models—fully‑connected KANs and convolutional KANs (ConvKANs)—each evaluated with three basis families (B‑splines, radial basis functions, Gram polynomials). Across 24 model‑basis combinations, the authors report up to 80× compression on image classification (MNIST, CIFAR‑10, CIFAR‑100) and 124.1× compression on high‑dimensional equation‑modeling tasks, all with negligible or no drop in test accuracy. For example, on CIFAR‑10 a fully‑connected MetaCluster‑KAN with 16 clusters reduces memory from ~3 MB to 38 KB (≈79.9×) while achieving 96.06 % accuracy, matching the uncompressed baseline. ConvKANs with 256 clusters show similar trends.

Ablation studies examine the influence of embedding dimensionality d_emb, number of clusters k, and coefficient vector size |w|. Results confirm that smaller d_emb yields tighter manifolds and better clustering, while increasing k improves reconstruction fidelity at the cost of a larger codebook. The authors also measure computational overhead: the meta‑learner adds roughly 5–10 % to total training time, and the clustering + fine‑tuning steps are lightweight, making MetaCluster practical for real‑world deployment.

In summary, MetaCluster provides a principled method to compress KANs without sacrificing their expressive benefits. By learning a low‑dimensional manifold for coefficient vectors, it makes high‑dimensional weight sharing feasible, and the resulting codebook‑based representation achieves orders‑of‑magnitude memory savings. This work opens the door for large‑scale scientific and vision applications of KANs in memory‑constrained environments.


Comments & Academic Discussion

Loading comments...

Leave a Comment