Towards Anytime-Valid Statistical Watermarking

Reading time: 5 minute
...

📝 Original Info

  • Title: Towards Anytime-Valid Statistical Watermarking
  • ArXiv ID: 2602.17608
  • Date: 2026-02-19
  • Authors: ** 저자 정보가 논문 본문에 명시되지 않아 제공할 수 없습니다. **

📝 Abstract

The proliferation of Large Language Models (LLMs) necessitates efficient mechanisms to distinguish machine-generated content from human text. While statistical watermarking has emerged as a promising solution, existing methods suffer from two critical limitations: the lack of a principled approach for selecting sampling distributions and the reliance on fixed-horizon hypothesis testing, which precludes valid early stopping. In this paper, we bridge this gap by developing the first e-value-based watermarking framework, Anchored E-Watermarking, that unifies optimal sampling with anytime-valid inference. Unlike traditional approaches where optional stopping invalidates Type-I error guarantees, our framework enables valid, anytime-inference by constructing a test supermartingale for the detection process. By leveraging an anchor distribution to approximate the target model, we characterize the optimal e-value with respect to the worst-case log-growth rate and derive the optimal expected stopping time. Our theoretical claims are substantiated by simulations and evaluations on established benchmarks, showing that our framework can significantly enhance sample efficiency, reducing the average token budget required for detection by 13-15% relative to state-of-the-art baselines.

💡 Deep Analysis

📄 Full Content

The revolutionary success of Large Language Models (LLMs) at generating human-like texts (Brown et al., 2020;Bubeck et al., 2023;Chowdhery et al., 2023) has raised several societal concerns regarding the misuse of LLM outputs. Unregulated LLM outputs pose risks ranging from the contamination of future training corpora (Shumailov et al., 2023;Das et al., 2024) to the propagation of disinformation (Zellers et al., 2019;Vincent, 2022) and academic misconduct (Jarrah et al., 2023;Milano et al., 2023). Consequently, there is an of p 0 . Departing from classical p-value analysis, we adopt e-values as the central detection paradigm. E-values (Vovk, 1993;Shafer et al., 2011;Vovk & Wang, 2021) are nonnegative random variables E satisfying E H 0 [E] ≤ 1 under the null. Unlike p-values, e-values arise from supermartingales, and it is therefore possible to preserve Type-I error guarantees under optional stopping based on ongoing analysis of data. We analyze the optimal e-value for watermark detection with respect to the worst-case log-growth rate (Kelly, 1956) and the expected stopping time (Grünwald et al., 2020;Waudby-Smith et al., 2025), thus fully characterize the average per-step growth rate of evidence and sample efficiency. Our main theoretical contributions are summarized informally as follows:

Theorem 1.1 (Informal version of Theorem 4.1 and Theorem 4.3). The optimal worst-case log-growth rate E H 1 [log E] under the alternative H 1 is given by:

,

where h = H(p 0 ) is the Shannon entropy of the anchor distribution p 0 and δ > 0 is a robustness tolerance parameter. Furthermore, the optimal expected stopping time to achieve a Type-I error α scales as log(1/α) J *

.

To the best of our knowledge, this work represents the first application of e-values to the domain of statistical watermarking. By enabling valid sequential testing, our framework significantly improves detection efficiency, allowing the system to flag machine-generated text with fewer tokens than fixed-horizon counterparts. With the ability to early-stop, this sequential paradigm enhances robustness against adaptive attacks: because the detector can terminate immediately upon accumulating sufficient evidence, the watermark remains effective even if the attacker perturbs the text heavily in later segments (post-stopping). Consequently, our approach offers a theoretically rigorous and practically superior alternative to existing heuristic detection methods. Through experiments on real watermarking benchmark (Piet et al., 2023), we show that the theoretical scheme implied by Theorem 1.1 achieves consistently higher sample efficiency than state-of-the-art methods, reducing the token consumption by 13-15% across various temperature settings.

Statistical watermarking. Watermarking offers a white-box provenance mechanism for detecting LLM-generated text (Tang et al., 2023), complementing post-hoc detectors and provenance tools developed for neural text generation (Zellers et al., 2019). Classical digital watermarking and steganography provide a broad toolbox for embedding and extracting imperceptible signals under benign or adversarial channel edits (Cox et al., 2007). Early works in NLP literature studied watermarking and tracing of text via editing or synonym substitutions (Venugopal et al., 2011;Rizzo et al., 2019;Abdelnabi & Fritz, 2021;Yang et al., 2022;Kamaruddin et al., 2018). In contrast, modern statistical (a.k.a. generative) watermarking (Aaronson, 2022a;Kirchenbauer et al., 2023) injects a secret, testable distributional bias into the sampling process, and detects this bias via hypothesis testing on the generated token sequence. A rapidly growing theory studies efficiency and optimality of watermarking tests and encoders: results include finite-sample guarantees, informationtheoretic limits, and constructions that are distortion-free or unbiased (Huang et al., 2023;Zhao et al., 2023;Li et al., 2024;Block et al., 2025;Kuditipudi et al., 2023;Hu et al., 2023;Xie et al., 2025). Negative and hardness results highlight fundamental limitations against adaptive or distribution-matching adversaries (Christ et al., 2023;Christ & Gunn, 2024;Golowich & Moitra, 2024), motivating alternative design considerations such as distribution-preserving and public-key schemes (Wu et al., 2023;Liu et al., 2023;Fairoze et al., 2023). Empirical robustness is commonly assessed under paraphrasing, editing and translation, with recent work studying cross-lingual failure modes and defenses (He et al., 2024b), and benchmarks/frameworks such as MarkMyWords and scalable pipelines for watermark evaluation and deployment (Piet et al., 2023;Zhang et al., 2024;Lau et al., 2024;Dathathri et al., 2024). Finally, a complementary line of work leverages semantic structure and auxiliary models to boost detection power under benign distributional structure, including semantic/paraphrastic watermarks and speculative-sampling-based schemes (Ren et al., 2024;Liu & Bu, 2024;Hou et al.,

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut