EMSEdit: Efficient Multi-Step Meta-Learning-based Model Editing
Large Language Models (LLMs) power numerous AI applications, yet updating their knowledge remains costly. Model editing provides a lightweight alternative through targeted parameter modifications, with meta-learning-based model editing (MLME) demonstrating strong effectiveness and efficiency. However, we find that MLME struggles in low-data regimes and incurs high training costs due to the use of KL divergence. To address these issues, we propose $\textbf{E}$fficient $\textbf{M}$ulti-$\textbf{S}$tep $\textbf{Edit (EMSEdit)}$, which leverages multi-step backpropagation (MSBP) to effectively capture gradient-activation mapping patterns within editing samples, performs multi-step edits per sample to enhance editing performance under limited data, and introduces norm-based regularization to preserve unedited knowledge while improving training efficiency. Experiments on two datasets and three LLMs show that EMSEdit consistently outperforms state-of-the-art methods in both sequential and batch editing. Moreover, MSBP can be seamlessly integrated into existing approaches to yield additional performance gains. Further experiments on a multi-hop reasoning editing task demonstrate EMSEdit’s robustness in handling complex edits, while ablation studies validate the contribution of each design component. Our code is available at https://github.com/xpq-tech/emsedit.
💡 Research Summary
The paper addresses the problem of updating factual knowledge in large language models (LLMs) without the prohibitive cost of full retraining. Model editing, which modifies only a small subset of parameters, has emerged as a lightweight alternative, and meta‑learning‑based model editing (MLME) methods such as MEND, RLEdit, and MALMEN have shown promising effectiveness and efficiency. However, the authors identify two critical limitations of existing MLME approaches. First, they rely on a single back‑propagation step to transform fine‑tuning gradients and hidden activations into weight updates. Consequently, when training data are scarce—a common scenario in real‑world applications—the performance of MLME degrades sharply. Second, MLME typically uses a KL‑divergence loss to preserve the original model’s knowledge, which requires a second forward pass through the original model for every training iteration, dramatically increasing computational and memory overhead.
To overcome these issues, the authors propose EMSEdit (Efficient Multi‑Step Edit). EMSEdit introduces multi‑step back‑propagation (MSBP), whereby each editing sample undergoes several forward‑backward cycles before the hypernetwork produces a weight update. This repeated processing allows the hypernetwork to capture richer gradient‑activation mapping patterns, effectively amplifying the learning signal from limited data. In parallel, EMSEdit replaces the KL loss with an L2‑norm regularization on the weight update. By penalizing the magnitude of the update, the method encourages minimal deviation from the original model while requiring only a single forward pass, thereby reducing both memory consumption and training time.
Architecturally, EMSEdit distinguishes between sequential and batch editing scenarios. For sequential editing, it employs a step‑specific hypernetwork that adapts its transformation at each edit step, preserving specificity across a series of edits. For batch editing, it introduces a step‑wise hypernetwork updating mechanism that accumulates updates across the batch without inflating memory usage. These designs balance editing efficacy with computational efficiency.
The authors evaluate EMSEdit on two benchmark datasets—ZsRE and CounterFact—using three LLMs of varying scale: GPT‑J (6 B), LLaMA‑3 (8 B), and Gemma‑2 (9 B). Experiments cover both sequential and batch editing, as well as a more challenging multi‑hop reasoning editing task. Across all settings, EMSEdit consistently outperforms state‑of‑the‑art baselines (RLEdit, MALMEN, and other recent methods). In low‑data regimes (as little as 10 % of the full training set), EMSEdit’s efficacy and generalization drop only marginally, whereas baselines suffer substantial performance loss. The L2 regularization yields a roughly 30 % reduction in memory footprint and a 25–35 % speed‑up in training time compared with KL‑based training.
A key additional finding is that MSBP can be retrofitted onto existing MLME methods without architectural changes. When the authors augment RLEdit and MALMEN with MSBP, they observe consistent gains of 2–3 percentage points in editing success rates, demonstrating the broad applicability of the multi‑step back‑propagation concept.
Ablation studies confirm the contribution of each component: (1) removing MSBP degrades performance especially under limited data; (2) substituting L2 regularization with KL loss increases training time and memory usage while offering no accuracy benefit; (3) using a single, non‑step‑specific hypernetwork reduces specificity in sequential editing. The paper also analyses the trade‑off between the number of back‑propagation steps and computational cost, showing diminishing returns beyond three steps.
In summary, EMSEdit advances model editing by (i) enhancing data efficiency through multi‑step gradient‑activation learning, (ii) improving training efficiency via norm‑based regularization, and (iii) providing flexible hypernetwork designs for both sequential and batch editing. The method enables practical, real‑time knowledge updates for LLMs deployed in dynamic environments such as the web, where labeling resources are limited and rapid adaptation is essential. Future work is suggested on scaling EMSEdit to even larger multimodal models and on longitudinal studies of edited model behavior.
Comments & Academic Discussion
Loading comments...
Leave a Comment