Learning the Mechanism of Catastrophic Forgetting: A Perspective from Gradient Similarity

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Catastrophic forgetting during knowledge injection severely undermines the continual learning capability of large language models (LLMs). Although existing methods attempt to mitigate this issue, they often lack a foundational theoretical explanation. We establish a gradient-based theoretical framework to explain catastrophic forgetting. We first prove that strongly negative gradient similarity is a fundamental cause of forgetting. We then use gradient similarity to identify two types of neurons: conflicting neurons that induce forgetting and account for 50%-75% of neurons, and collaborative neurons that mitigate forgetting and account for 25%-50%. Based on this analysis, we propose a knowledge injection method, Collaborative Neural Learning (CNL). By freezing conflicting neurons and updating only collaborative neurons, CNL theoretically eliminates catastrophic forgetting under an infinitesimal learning rate eta and an exactly known mastered set. Experiments on five LLMs, four datasets, and four optimizers show that CNL achieves zero forgetting in in-set settings and reduces forgetting by 59.1%-81.7% in out-of-set settings.

💡 Research Summary

The paper tackles the problem of catastrophic forgetting that occurs when large language models (LLMs) are injected with new knowledge. Existing mitigation techniques lack a solid theoretical foundation, and the authors fill this gap by developing a gradient‑based framework that explains why forgetting happens and how it can be prevented.

Theoretical Foundations
The authors model an LLM as a differentiable function fθ with parameters θ. They define two disjoint data sets: the injection set I (containing samples the model currently gets wrong) and the mastered set M (containing samples the model already answers correctly). For each set they define a mean loss, L_I(θ) and L_M(θ). Using a first‑order Taylor expansion under an infinitesimal learning rate η, they derive:

The loss change on the injection set after one gradient step is ΔL_I = –η ∇θL_I·∇θL_I = –η S(I,I). Since the inner product S(I,I) is non‑negative, the loss on I always decreases, i.e., the model learns the new knowledge.
The loss change on the mastered set is ΔL_M = –η ∇θL_M·∇θL_I = –η S(M,I). If the gradient similarity S(M,I) is negative, the loss on M increases, which is precisely catastrophic forgetting.

Thus, strongly negative gradient similarity between the gradients of M and I is identified as the fundamental cause of forgetting.

Empirical Validation
The authors test this theory on five LLMs (Qwen2.5 1.5B, 3B, 7B and LLaMA3.2 1B, 3B) across four benchmark datasets (MMLU, MedQA, ARC‑C, CSQA). For each model they separate correctly answered questions (forming M) from incorrectly answered ones (forming I), then fine‑tune on I. Results show that while the models improve on I, they simultaneously suffer a substantial increase in loss on M, confirming the occurrence of catastrophic forgetting. Even parameter‑efficient fine‑tuning methods such as LoRA exhibit the same phenomenon.

Neuron‑Level Decomposition
The global gradient similarity S(M,I) is decomposed into per‑parameter contributions: s_j(M,I)=∇θ_jL_M·∇θ_jL_I. The authors define:

Conflicting neurons (θ_CF): parameters with s_j(M,I) < 0. These neurons push the loss on M upward and are responsible for forgetting.
Collaborative neurons (θ_CB): parameters with s_j(M,I) ≥ 0. These neurons reduce the loss on M and help retain knowledge.

Statistical analysis across all models shows that conflicting neurons constitute roughly 50 %–75 % of all parameters, while collaborative neurons account for 25 %–50 %. The dominance of conflicting neurons explains why the net effect is forgetting.

Collaborative Neural Learning (CNL)
Motivated by the neuron‑level analysis, the authors propose a new training regime called Collaborative Neural Learning. CNL freezes all conflicting neurons (i.e., sets their updates to zero) and allows only collaborative neurons to be updated during knowledge injection. Formally, the update rule becomes:

θ ← θ – η I

Learning the Mechanism of Catastrophic Forgetting: A Perspective from Gradient Similarity

💡 Research Summary

Comments & Academic Discussion

Leave a Comment