EtCon: Edit-then-Consolidate for Reliable Knowledge Editing

EtCon: Edit-then-Consolidate for Reliable Knowledge Editing
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Knowledge editing aims to update specific facts in large language models (LLMs) without full retraining. Prior efforts sought to tune the knowledge layers of LLMs, achieving improved performance in controlled, teacher-forced evaluations. However, they still encounter challenges in real-world autoregressive generation scenarios, which greatly limit their practical applicability. Our empirical analysis reveals two issues: (1) Most methods degrade pre-trained capabilities after injecting new knowledge; (2) They may exhibit a discrepancy between stored parametric knowledge and inference-time autoregressive generation behavior. To this end, we propose EtCon, an edit-then-consolidate paradigm that couples targeted edits with post-edit consolidation. Specifically, our framework comprises two stages: (1) Targeted Proximal Supervised Fine-Tuning (TPSFT) performs a constrained targeted edit to update parametric knowledge while controlling policy drift. (2) Group Relative Policy Optimization (GRPO) consolidates the edit by aligning autoregressive trajectories with the intended fact. Extensive experiments demonstrate that our EtCon improves editing reliability and real-world generalization, while better preserving pre-trained capabilities.


💡 Research Summary

The paper “EtCon: Edit‑then‑Consolidate for Reliable Knowledge Editing” addresses a critical gap in current knowledge‑editing techniques for large language models (LLMs). Existing methods—whether they directly modify model weights (in‑place parametric edits) or attach auxiliary modules (external‑assisted edits)—show promising results under teacher‑forced evaluations but often fail when the model generates text autoregressively in real‑world settings. The authors identify two root problems: (1) degradation of the model’s pre‑trained capabilities because edits are optimized on a tiny set of examples without explicit regularization, leading to over‑specialization and cumulative drift in lifelong editing scenarios; and (2) a mismatch between stored parametric knowledge and actual generation behavior, caused by the train‑test distribution shift between teacher‑forced prefixes (used during editing) and self‑generated prefixes (used at inference time).

To solve these issues, EtCon introduces a two‑stage pipeline: Targeted Proximal Supervised Fine‑Tuning (TPSFT) followed by Group Relative Policy Optimization (GRPO).

Stage 1 – TPSFT
TPSFT limits weight updates to the feed‑forward network (FFN) layers, which prior work has shown to be the primary locus of factual knowledge. By restricting updates to these layers, the method avoids unnecessary changes to other parts of the model, preserving overall competence. TPSFT also incorporates a PPO‑style probability‑ratio clipping (1 ± ε) that acts as a trust‑region constraint: the likelihood ratio between the current policy and a reference policy (initially the vanilla model, later the most recent edited model) is bounded, and gradients are zeroed when the ratio exceeds the upper bound. This prevents any single edit from dominating the parameter space. Additionally, each editing instance is enriched with a self‑generated chain‑of‑thought (CoT) explanation. The model first produces a CoT for the query, the answer is replaced with the desired fact, and the combined sequence becomes the supervision target. This CoT‑augmented supervision encourages the model to associate the new fact with valid reasoning patterns rather than memorizing a single input‑output mapping, further reducing over‑fitting.

Despite these safeguards, TPSFT remains a teacher‑forced training regime; during inference the model must rely on its own generated prefixes, which can still cause a knowledge‑behavior mismatch.

Stage 2 – GRPO
GRPO is a trajectory‑level reinforcement‑learning algorithm that optimizes a policy by comparing the relative rewards of a group of sampled generations for the same query. Starting from the TPSFT‑edited model (πθ_new), the method samples m complete autoregressive trajectories per query, scores each against the target answer, and computes a group‑relative advantage without needing a separate critic. The reference policy for KL‑regularization is set to the TPSFT model itself, ensuring that consolidation refines behavior while keeping the parametric edits intact. The reward function is a weighted sum of four components: factual accuracy, output format compliance, cleanliness (conciseness), and internal consistency (coherence between intermediate reasoning steps and the final answer). By directly optimizing on self‑generated trajectories, GRPO bridges the train‑inference gap, making the edited knowledge reliably surface during real generation.

Experiments
The authors evaluate EtCon on four benchmarks—ZsRE, COUNTERFACT, MQuAKE‑CF‑v2 (1‑edit subset), and QAEdit—using 1,000 samples per dataset. Two instruction‑tuned LLMs are edited: Llama‑3‑8B‑Instruct and Qwen2.5‑7B‑Instruct. Evaluation follows a realistic protocol: autoregressive generation with natural stopping criteria and an LLM‑as‑a‑judge scoring system (OpenAI 2025). Three metrics are reported: Reliability (fraction of cases where the new fact is more likely than the old), Generalization (performance on re‑phrased queries), and Locality (preservation of unrelated facts).

Table 1 shows that adding the GRPO consolidation stage (+GRPO) dramatically improves both Reliability and Generalization across strong baselines such as FT‑M and AlphaEdit, while Locality remains largely unchanged. For example, FT‑M’s reliability jumps from 16.6 % to 62.9 % (+46.3 p) after GRPO, and Generalization rises from 15.5 % to 52.7 % (+37.2 p). Similar gains are observed for AlphaEdit. Reward‑curve plots illustrate that EtCon quickly reaches high reliability within a few edit steps and then plateaus, indicating stable lifelong editing.

Contributions

  1. Empirical evidence that a post‑edit consolidation stage is essential for reliable knowledge editing under realistic generation conditions.
  2. TPSFT, a novel fine‑tuning scheme that combines targeted FFN updates, trust‑region ratio clipping, and CoT‑augmented supervision to preserve pre‑trained abilities while effecting edits.
  3. Integration of GRPO to align parametric updates with actual generation behavior, mitigating the knowledge‑behavior mismatch.
  4. Extensive experiments on two modern instruction‑tuned models that demonstrate superior performance in reliability, generalization, and locality compared to existing state‑of‑the‑art methods.

Significance and Future Directions
EtCon establishes a new paradigm where knowledge editing is treated as a two‑step process: first inject the fact at the parameter level with strong regularization, then consolidate it at the policy level through trajectory‑level reinforcement learning. This approach bridges the gap between static parametric knowledge and dynamic generation, making LLMs more amenable to continual, fine‑grained updates in production environments. Future work could explore scaling to larger models (tens of billions of parameters), handling simultaneous edits of multiple, potentially conflicting facts, and automating reward design (e.g., via human feedback or meta‑learning) to further improve consolidation efficiency.


Comments & Academic Discussion

Loading comments...

Leave a Comment