GeoRA: Geometry-Aware Low-Rank Adaptation for RLVR

GeoRA: Geometry-Aware Low-Rank Adaptation for RLVR
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Reinforcement Learning with Verifiable Rewards (RLVR) is crucial for advancing large-scale reasoning models. However, existing parameter-efficient methods, such as PiSSA and MiLoRA, are designed for Supervised Fine-Tuning (SFT) and do not account for the distinct optimization dynamics and geometric structures of RLVR. Applying these methods directly leads to spectral collapse and optimization instability, which severely limit model performance. Meanwhile, alternative approaches that leverage update sparsity encounter significant efficiency bottlenecks on modern hardware due to unstructured computations. To address these challenges, we propose GeoRA (Geometry-Aware Low-Rank Adaptation), which exploits the anisotropic and compressible nature of RL update subspaces. GeoRA initializes adapters by extracting principal directions via Singular Value Decomposition (SVD) within a geometrically constrained subspace while freezing the residual components. This method preserves the pre-trained geometric structure and enables efficient GPU computation through dense operators. Experiments on Qwen and Llama demonstrate that GeoRA mitigates optimization bottlenecks caused by geometric misalignment. It consistently outperforms established low-rank baselines on key mathematical benchmarks, achieving state-of-the-art (SOTA) results. Moreover, GeoRA shows superior generalization and resilience to catastrophic forgetting in out-of-domain tasks.


💡 Research Summary

The paper introduces GeoRA (Geometry‑Aware Low‑Rank Adaptation), a parameter‑efficient fine‑tuning method specifically designed for Reinforcement Learning with Verifiable Rewards (RLVR). Existing PEFT techniques such as PiSSA and MiLoRA were created for supervised fine‑tuning and assume that the most effective updates lie along the principal components of the weight matrix. In contrast, mechanistic studies of RLVR show that stable updates tend to stay orthogonal to the pre‑trained dominant directions, preserving the original geometry while making small, anisotropic changes. Directly applying SVD‑based adapters to RLVR therefore causes spectral collapse, instability, and poor performance. Sparse‑based PEFT also suffers from hardware inefficiency because modern GPUs cannot accelerate unstructured sparsity.

GeoRA addresses these issues by first constructing a geometry‑constrained matrix W_geo through a dual‑masking process. The Spectral Prior (M_Spec) selects low‑magnitude entries from a low‑rank approximation of W, suppressing high‑curvature components, while the Euclidean Prior (M_Euc) selects small‑magnitude weights to retain plasticity. The union of these masks yields W_geo = W ⊙ (M_Spec ∪ M_Euc).

A singular value decomposition is then performed on W_geo, and the top‑r singular components are used to initialize low‑rank adapters A_geo and B_geo as follows: A_geo = Σ_geo^{1/2} V_geo^T, B_geo = U_geo Σ_geo^{1/2}. These adapters reconstruct a rank‑r approximation of W_geo. To keep the model’s output unchanged at initialization, a residual matrix W_res = W − α·r·B_geo·A_geo is frozen throughout training. The forward pass therefore computes h = W_res x + α·r·B_geo·A_geo x, ensuring that only the geometry‑aligned subspace (parameterized by A_geo and B_geo) is trainable while the frozen residual acts as a stability leash.

Experiments were conducted on Qwen‑3‑8B‑Base and Llama‑3.1‑8B‑Instruct, fine‑tuned on the DeepMath‑103K dataset using the GRPO RLVR algorithm. With rank r = 16 and sparsity ratio ρ = 0.2, GeoRA consistently outperformed LoRA, PiSSA, MiLoRA, SparseFT, and even full‑parameter fine‑tuning on both in‑distribution (MATH‑500, AIME, OLYMMA‑TH) and out‑of‑distribution (HumanEval, GPQA, MMLU) benchmarks. Notably, GeoRA achieved the highest average scores (34.04 for Qwen, 24.51 for Llama) and demonstrated faster, more stable convergence—reaching peak performance around 300 training steps, well before other methods.

In terms of efficiency, GeoRA reduces trainable parameters by 99.5 % (down to 0.04 B from 8 B) and cuts VRAM usage by roughly 28 %, while also decreasing wall‑clock time per iteration by 19.9 % compared to full fine‑tuning. Ablation studies show that both the Spectral and Euclidean priors are essential; removing either degrades performance. Random or “tail” initialization (using low‑rank components not aligned with the geometry) also underperforms relative to the proposed SVD‑based initialization. Moreover, GeoRA remains robust across a wide learning‑rate range (0.2–1.0), whereas baseline methods often diverge at higher rates.

Overall, GeoRA provides a principled way to align low‑rank adaptation with the anisotropic, compressible update subspace characteristic of RLVR. By preserving the pre‑trained geometric structure through a frozen residual and focusing updates on a geometry‑aware low‑rank manifold, it simultaneously improves performance, stability, and computational efficiency. The work suggests a new direction for PEFT in reinforcement‑learning settings and opens avenues for scaling the approach to larger models and multimodal domains.


Comments & Academic Discussion

Loading comments...

Leave a Comment