CoSpaDi: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning

CoSpaDi: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Post-training compression of large language models (LLMs) often relies on low-rank weight approximations that represent each column of the weight matrix in a shared low-dimensional subspace. This strategy is computationally efficient but the underlying constraint can be overly rigid for heterogeneous projection weights and may incur avoidable accuracy loss. We propose CoSpaDi (Compression via Sparse Dictionary Learning), a training-free framework that replaces low-rank factorization with a structured sparse decomposition in which each weight matrix is represented as a dense dictionary multiplied by a column-sparse coefficient matrix. This yields a union-of-subspaces model: the columns of the weight matrix are represented as linear combinations of different subsets of dictionary atoms, improving expressiveness at a fixed parameter budget. CoSpaDi is calibration-guided: using a small calibration set, we optimize the factorization to minimize functional reconstruction error of layer outputs rather than weight-space error. An activation-derived Gram orthonormalization reformulates this data-aware objective into a standard dictionary learning problem on transformed weights, and we support both per-layer compression and cross-layer dictionary sharing within groups of similar projections. Across Llama and Qwen model families, CoSpaDi consistently improves the accuracy–compression and perplexity–compression trade-offs over state-of-the-art SVD-based baselines and strong structured pruning baselines at 20-40% compression ratios. The resulting structured sparsity enables sparse–dense computation and integrates with post-training quantization of the sparse coefficients.


💡 Research Summary

CoSpaDi introduces a novel, training‑free compression framework for large language models (LLMs) that replaces the conventional low‑rank singular value decomposition (SVD) with a structured sparse dictionary learning (SDL) approach. The method factorizes each weight matrix W∈ℝ^{d₁×d₂} into a dense dictionary D∈ℝ^{d₁×k} and a column‑sparse coefficient matrix S∈ℝ^{k×d₂}, where each column of S contains at most s non‑zero entries. This “union‑of‑subspaces” representation allows different columns of W to be reconstructed from different subsets of dictionary atoms, providing greater expressive power than a single shared subspace while keeping the total parameter count comparable to low‑rank approximations.

A key innovation is the calibration‑guided functional objective. Instead of minimizing the Frobenius norm of the weight reconstruction, CoSpaDi minimizes the output reconstruction error on a small calibration dataset: ‖XW − XDS‖₂, where X∈ℝ^{N×d₁} contains N activation samples. To decouple the data‑dependent term, the authors compute a Gram‑orthonormalization matrix L from the calibration inputs (e.g., via QR or Cholesky of XᵀX). By defining transformed variables WL = L W and DL = L D, the functional loss reduces to a standard SDL problem: min_{DL,S} ‖WL − DL S‖₂ subject to column‑sparsity. After solving this transformed problem, the original compressed weight is recovered as ˜W = Dₐ S with Dₐ = L⁻¹ DL.

Optimization proceeds by alternating minimization: (1) sparse coding of each column of WL using Orthogonal Matching Pursuit (OMP) to enforce the s‑nonzero constraint, and (2) dictionary update via either Method of Optimal Directions (MOD) or K‑SVD. The authors favor K‑SVD with power‑iteration rank‑1 updates for a good trade‑off between speed and accuracy. The framework naturally supports per‑layer compression as well as cross‑layer dictionary sharing for groups of similar projections (e.g., all feed‑forward layers or all attention Q/K/V matrices), further reducing storage.

Experiments cover Llama‑2 (7B and 13B) and Qwen‑7B models under compression ratios of 20 %–40 %. CoSpaDi is benchmarked against state‑of‑the‑art SVD‑based methods (SVD‑LLM, Basis‑Sharing) and strong structured pruning baselines. Across most settings, CoSpaDi achieves less than 0.2 % drop in accuracy while improving perplexity by 1–2 % relative to the baselines. The resulting structured sparsity aligns with hardware‑friendly patterns (e.g., 2:4 or 4:8 sparsity), enabling practical speedups on modern GPUs with sparse tensor cores. Moreover, the sparse coefficient matrix S can be post‑training quantized (PTQ) to 4‑bit precision, yielding additional memory savings without noticeable performance loss.

Limitations include the need to select dictionary size k and sparsity level s, which may require modest hyper‑parameter tuning, and the O(N · d₂) cost of OMP for very large matrices. The current study focuses on projection matrices; extending the approach to LayerNorm, bias terms, or other architectural components remains future work. Potential directions include automated hyper‑parameter search, integration with hardware‑specific sparse kernels, and broader cross‑layer sharing schemes.

In summary, CoSpaDi offers a compelling alternative to low‑rank post‑training compression by leveraging sparse dictionary learning guided by calibration data. It delivers superior accuracy‑compression trade‑offs, maintains compatibility with quantization, and produces structured sparsity amenable to efficient inference, positioning it as a practical tool for deploying LLMs on resource‑constrained platforms.


Comments & Academic Discussion

Loading comments...

Leave a Comment