Astro: Activation-guided Structured Regularization for Outlier-Robust LLM Post-Training Quantization
Weight-only post-training quantization (PTQ) is crucial for efficient Large Language Model (LLM) deployment but suffers from accuracy degradation caused by weight and activation outliers. Existing mitigation strategies often face critical limitations: they either yield insufficient outlier suppression or incur significant deployment inefficiencies, such as inference latency, heavy preprocessing, or reliance on complex operator fusion. To resolve these limitations, we leverage a key insight: over-parameterized LLMs often converge to Flat Minima, implying a vast equivalent solution space where weights can be adjusted without compromising accuracy. Building on this, we propose Astro, an Activation-guided Structured Regularization framework designed to suppress the negative effects of outliers in a hardware-friendly and efficient manner. Leveraging the activation-guided regularization objective, Astro actively reconstructs intrinsically robust weights, aggressively suppressing weight outliers corresponding to high-magnitude activations without sacrificing model accuracy. Crucially, Astro introduces zero inference latency and is orthogonal to mainstream quantization methods like GPTQ. Extensive experiments show that Astro achieves highly competitive performance; notably, on LLaMA-2-7B, it achieves better performance than complex learning-based rotation methods with almost 1/3 of the quantization time.
💡 Research Summary
The paper introduces Astro, a novel post‑training quantization (PTQ) framework for large language models (LLMs) that specifically targets the degradation caused by weight and activation outliers. The authors observe that over‑parameterized LLMs typically converge to flat minima in the loss landscape, which implies a large manifold of weight configurations that achieve essentially the same training loss. This theoretical insight (Theorem 4.3) provides a degree of freedom: weights can be moved within this flat region without harming model accuracy, allowing the search for weight sets that are intrinsically more robust to low‑bit quantization.
A second theoretical contribution (Theorem 4.5) derives an upper bound on quantization error, showing that the error is multiplicatively coupled to the magnitude of the activations that a weight group processes. In other words, a weight outlier in a channel with large activation values will amplify quantization error far more than an outlier in a low‑activation channel. This motivates a activation‑guided structured regularization objective: for each weight group (k), compute the Frobenius norm (|X_k|_F) of the corresponding activation slice (using a small calibration set). The regularization strength (\alpha_k) is set proportional to this norm, i.e. (\alpha_k \propto |X_k|_F). High‑activation groups receive strong L2 regularization, which pushes their weights toward smaller magnitudes and suppresses outliers; low‑activation groups receive weak regularization, preserving fine‑grained weight information.
The overall optimization problem becomes:
\
Comments & Academic Discussion
Loading comments...
Leave a Comment