Revisiting Weight Regularization for Low-Rank Continual Learning

Revisiting Weight Regularization for Low-Rank Continual Learning
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Continual Learning (CL) with large-scale pre-trained models (PTMs) has recently gained wide attention, shifting the focus from training from scratch to continually adapting PTMs. This has given rise to a promising paradigm: parameter-efficient continual learning (PECL), where task interference is typically mitigated by assigning a task-specific module during training, such as low-rank adapters. However, weight regularization techniques, such as Elastic Weight Consolidation (EWC)-a key strategy in CL-remain underexplored in this new paradigm. In this paper, we revisit weight regularization in low-rank CL as a new perspective for mitigating task interference in PECL. Unlike existing low-rank CL methods, we mitigate task interference by regularizing a shared low-rank update through EWC, thereby keeping the storage requirement and inference costs constant regardless of the number of tasks. Our proposed method EWC-LoRA leverages a low-rank representation to estimate parameter importance over the full-dimensional space. This design offers a practical, computational- and memory-efficient solution for CL with PTMs, and provides insights that may inform the broader application of regularization techniques within PECL. Extensive experiments on various benchmarks demonstrate the effectiveness of EWC-LoRA, achieving a stability-plasticity trade-off superior to existing low-rank CL approaches. These results indicate that, even under low-rank parameterizations, weight regularization remains an effective mechanism for mitigating task interference. Code is available at: https://github.com/yaoyz96/low-rank-cl.


💡 Research Summary

The paper tackles the challenge of continual learning (CL) with large pre‑trained models (PTMs) under the parameter‑efficient continual learning (PECL) paradigm, where the backbone model is frozen and lightweight adapters such as low‑rank LoRA modules are inserted for each new task. Existing PECL methods typically allocate a separate LoRA branch per task, which guarantees isolation but causes memory and compute costs to grow linearly with the number of tasks. Elastic Weight Consolidation (EWC), a classic weight‑regularization technique, can in principle prevent catastrophic forgetting by penalizing changes to important parameters, but its reliance on a full‑dimensional Fisher Information Matrix (FIM) makes it impractical for PTMs due to prohibitive storage and computation.

To reconcile these issues, the authors propose EWC‑LoRA, a method that (1) shares a single low‑rank update ΔW = AB across all tasks, (2) computes a full‑dimensional diagonal Fisher matrix after each task, and (3) adds a regularization term λ/2 · vec(AB)ᵀ F_cum · vec(AB) to the loss. This formulation regularizes the product of the low‑rank matrices rather than each factor independently, thereby capturing the true sensitivity of the model output to the low‑rank update. The authors mathematically prove that separate regularization of A and B diverges from full‑space regularization and that using a fixed Fisher computed on the frozen backbone introduces noise because it includes directions irrelevant to the learned LoRA update.

Algorithmically, after training on task t the method (i) evaluates the loss L_t on the current task data, (ii) updates the shared LoRA parameters A and B via gradient descent, (iii) estimates the diagonal Fisher on the current parameters, and (iv) accumulates it into F_cum for use in the next task’s regularization. Only A, B, and the diagonal of the Fisher need to be stored, keeping the memory footprint constant regardless of task count.

Extensive experiments on class‑incremental image classification benchmarks using Vision Transformers demonstrate that EWC‑LoRA outperforms prior low‑rank CL approaches (including multi‑task LoRA and O‑LoRA) by an average of 8.92 percentage points in accuracy. It also achieves a better stability‑plasticity trade‑off, maintaining higher performance on earlier tasks while still learning new ones effectively. Importantly, the method requires roughly the same or less computational overhead than competing methods because the Fisher is diagonal and the low‑rank matrices are small.

In summary, the paper provides the first systematic study of weight regularization within low‑rank continual learning, shows that naïve integration of EWC is suboptimal, and introduces a principled, memory‑efficient solution that leverages full‑dimensional importance estimates while operating entirely in the low‑rank space. This work suggests a new design principle—shared parameters plus regularization—as a viable alternative to task‑specific module isolation for scalable, efficient continual adaptation of large pre‑trained models.


Comments & Academic Discussion

Loading comments...

Leave a Comment