Alignment-Aware Quantization for LLM Safety
Reading time: 2 minute
...
📝 Original Info
- Title: Alignment-Aware Quantization for LLM Safety
- ArXiv ID: 2511.07842
- Date: 2025-11-11
- Authors: ** 논문에 저자 정보가 명시되지 않았습니다. (보통은 “Anonymous” 혹은 “제 1 저자 등” 형태로 표시됩니다.) **
📝 Abstract
Safety and efficiency are paramount yet often conflicting requirements for deploying Large Language Models (LLMs). While LLMs are trained to follow human alignment for safety, Post-Training Quantization (PTQ) is applied afterward to ensure efficiency. Here we identify a fundamental flaw in the conventional PTQ paradigm: quantization can turn into a safety vulnerability if it only aims to achieve low perplexity. To address this, we propose Alignment-Aware Quantization (AAQ), a novel approach that integrates an Alignment-Preserving Contrastive (APC) loss into the PTQ pipeline. Our method explicitly preserves alignment by encouraging the quantized model to mimic its safe, instruction-tuned model while diverging from the unaligned, pre-trained counterpart. AAQ achieves robust safety alignment without specialized safety-focused datasets, using only standard calibration data. We show that AAQ is compatible with standard PTQ techniques and enables robust 4-bit (W4A4) quantization across diverse model families. Our work resolves the critical trade-off between efficiency and safety, paving the way toward LLMs that are both efficient and trustworthy. Anonymized code is available in the supplementary material.💡 Deep Analysis
📄 Full Content
Reference
This content is AI-processed based on open access ArXiv data.