Anchored Decoding: Provably Reducing Copyright Risk for Any Language Model

Anchored Decoding: Provably Reducing Copyright Risk for Any Language Model
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Modern language models (LMs) tend to memorize portions of their training data and emit verbatim spans. When the underlying sources are sensitive or copyright-protected, such reproduction raises issues of consent and compensation for creators and compliance risks for developers. We propose Anchored Decoding, a plug-and-play inference-time method for suppressing verbatim copying: it enables decoding from any risky LM trained on mixed-license data by keeping generation in bounded proximity to a permissively trained safe LM. Anchored Decoding adaptively allocates a user-chosen information budget over the generation trajectory and enforces per-step constraints that yield a sequence-level guarantee, enabling a tunable risk-utility trade-off. To make Anchored Decoding practically useful, we introduce a new permissively trained safe model (TinyComma 1.8B), as well as Anchored${\mathrm{Byte}}$ Decoding, a byte-level variant of our method that enables cross-vocabulary fusion via the ByteSampler framework (Hayase et al., 2025). We evaluate our methods across six model pairs on long-form evaluations of copyright risk and utility. Anchored and Anchored${\mathrm{Byte}}$ Decoding define a new Pareto frontier, preserving near-original fluency and factuality while eliminating up to 75% of the measurable copying gap (averaged over six copying metrics) between the risky baseline and a safe reference, at a modest inference overhead.


💡 Research Summary

The paper addresses the growing concern that large language models (LLMs) often memorize and reproduce verbatim text from their training data, which can include copyrighted material. Existing mitigation strategies—such as extensive data filtering and re‑training or using curated “seed‑word” lists—are either prohibitively expensive or overly restrictive. To overcome these limitations, the authors propose Anchored Decoding, an inference‑time technique that fuses the next‑token distributions of a high‑utility “risky” model (trained on mixed‑license data) with those of a “safe” model (trained exclusively on permissively licensed text).

The core of the method is a global KL‑budget K that limits the total divergence of the generated sequence from the safe model. This global budget is decomposed into per‑step caps (k_t) via the chain rule for KL divergence, allowing the algorithm to enforce a local KL constraint at each decoding step while still guaranteeing the overall K‑NAF (K‑Near Access‑Freeness) property. The optimal fused distribution at step (t) is shown to be a weighted geometric mean of the safe and risky distributions, with the mixing weight determined by a Lagrange multiplier (\lambda). Solving for (\lambda) reduces to a one‑dimensional root‑finding problem, efficiently handled by a safeguarded Newton‑Raphson routine.

Two practical enhancements improve budget utilization. First, a prefix‑debt (\delta_{\text{init}}(x)) is computed from the prompt: the method identifies the top‑(n) tokens where the risky model’s log‑likelihood ratio over the safe model is highest, interpreting these as early signs of memorization. The debt is subtracted from the global budget, effectively forcing the decoder to rely more on the safe model in the initial steps for high‑risk prompts. Second, an adaptive banking scheme tracks the actual KL spent at each step and rolls any unused budget forward, allowing the decoder to be more permissive when the models naturally agree and to conserve budget for later “spikes” of risk.

The authors also release TinyComma 1.8B, a compact safe model trained on 169.5 B tokens of openly licensed data and using the Llama 3.1 tokenizer, enabling direct token‑level fusion with many popular LLMs. To handle cases where tokenizers differ, they introduce Anchored Byte Decoding, a byte‑level analogue that operates on next‑byte distributions via the ByteSampler framework, thereby bypassing tokenizer incompatibility.

Empirical evaluation spans six risky‑safe model pairs (e.g., TinyComma + Llama 3.1 70B, TinyComma + Gemma 2B). The authors assess copyright risk using six copying metrics (n‑gram overlap, longest common subsequence, etc.) and utility via fluency and factuality scores. Results show that Anchored Decoding reduces the measurable copying gap by up to 75 % on average relative to the risky baseline, while preserving near‑original fluency—often within a few points of the risky model’s scores. Inference overhead is modest; pairing a large risky model with the much smaller safe model incurs roughly 1.1× the runtime of the risky model alone.

Theoretical contributions include a proof (Theorem 3.1) that per‑step KL constraints compose to satisfy the global K‑NAF guarantee, and a demonstration (Proposition 3.3) that the weighted geometric mean is the optimal solution to the constrained optimization. The paper also shows that the framework extends to other divergences (e.g., ∞‑Rényi) and can be applied beyond text—any generative process where a trusted reference distribution is available.

In summary, Anchored Decoding offers a plug‑and‑play, training‑free solution that gives developers fine‑grained, provable control over copyright leakage while retaining most of the performance benefits of high‑capacity LLMs. By providing both token‑level and byte‑level fusion mechanisms and releasing a lightweight safe model, the work paves the way for broader, legally compliant deployment of powerful generative AI systems.


Comments & Academic Discussion

Loading comments...

Leave a Comment