Interpretable Safety Alignment via SAE-Constructed Low-Rank Subspace Adaptation
Reading time: 1 minute
...
📝 Original Info
- Title: Interpretable Safety Alignment via SAE-Constructed Low-Rank Subspace Adaptation
- ArXiv ID: 2512.23260
- Date: 2025-12-29
- Authors: Dianyun Wang, Qingsen Ma, Yuhu Shang, Zhifeng Lu, Zhenbo Xu, Lechen Ning, Huijia Wu, Zhaofeng He