Watermarking Language Models with Error Correcting Codes

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Recent progress in large language models enables the creation of realistic machine-generated content. Watermarking is a promising approach to distinguish machine-generated text from human text, embedding statistical signals in the output that are ideally undetectable to humans. We propose a watermarking framework that encodes such signals through an error correcting code. Our method, termed robust binary code (RBC) watermark, introduces no noticeable degradation in quality. We evaluate our watermark on base and instruction fine-tuned models and find that our watermark is robust to edits, deletions, and translations. We provide an information-theoretic perspective on watermarking, a powerful statistical test for detection and for generating $p$-values, and theoretical guarantees. Our empirical findings suggest our watermark is fast, powerful, and robust, comparing favorably to the state-of-the-art.

💡 Research Summary

The paper introduces a novel watermarking framework for large language models (LLMs) that embeds a hidden statistical signal into generated text without degrading its quality, while remaining robust to a wide range of post‑generation attacks such as deletions, insertions, paraphrasing, and translation. The proposed method, called Robust Binary Code (RBC) watermark, leverages error‑correcting codes (ECC) and a Correlated Binary Sampling Channel (CBSC) to achieve three desiderata simultaneously: (1) next‑token distortion‑free generation, (2) high detection power even on short passages, and (3) strong robustness to text perturbations.

Method Overview
Each token in the model’s vocabulary V is mapped to a fixed‑length binary string via an injective converter Γ (ℓ≈⌈log₂|V|⌉ bits). At generation time a sliding window of w_in previous tokens is converted to bits, XORed with a random secret string R, and interpreted as a k‑bit message M. M is encoded with an

Watermarking Language Models with Error Correcting Codes

💡 Research Summary

Comments & Academic Discussion

Leave a Comment