Finding Structure in Continual Learning
Learning from a stream of tasks usually pits plasticity against stability: acquiring new knowledge often causes catastrophic forgetting of past information. Most methods address this by summing competing loss terms, creating gradient conflicts that are managed with complex and often inefficient strategies such as external memory replay or parameter regularization. We propose a reformulation of the continual learning objective using Douglas-Rachford Splitting (DRS). This reframes the learning process not as a direct trade-off, but as a negotiation between two decoupled objectives: one promoting plasticity for new tasks and the other enforcing stability of old knowledge. By iteratively finding a consensus through their proximal operators, DRS provides a more principled and stable learning dynamic. Our approach achieves an efficient balance between stability and plasticity without the need for auxiliary modules or complex add-ons, providing a simpler yet more powerful paradigm for continual learning systems.
💡 Research Summary
Continual learning (CL) faces the classic stability‑plasticity dilemma: a model must acquire new knowledge (plasticity) while retaining previously learned information (stability). Most existing CL methods address this by forming a single loss L_CL = L_new‑task + R_regularization, where the regularization term penalizes changes to important parameters or adds replay data. This formulation forces the two objectives into direct competition, leading to gradient conflicts that are usually mitigated with memory buffers, architectural expansion, or complex penalty schedules.
The paper proposes a fundamentally different perspective: rather than summing the two objectives, it treats them as separate functions f (task‑fitting/plasticity) and g (stability/prior‑alignment) and solves the composite problem min f(θ,ϕ)+g(ϕ) via Douglas‑Rachford Splitting (DRS), a classic operator‑splitting algorithm. In this view, learning becomes a negotiation between two proximal sub‑problems instead of a tug‑of‑war.
Key components of the method:
- Task‑fitting proximal (plasticity) – Compute x_i = prox_f(u_i) = arg min_{θ,ϕ}
Comments & Academic Discussion
Loading comments...
Leave a Comment