dgMARK: Decoding-Guided Watermarking for Diffusion Language Models

dgMARK: Decoding-Guided Watermarking for Diffusion Language Models
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We propose dgMARK, a decoding-guided watermarking method for discrete diffusion language models (dLLMs). Unlike autoregressive models, dLLMs can generate tokens in arbitrary order. While an ideal conditional predictor would be invariant to this order, practical dLLMs exhibit strong sensitivity to the unmasking order, creating a new channel for watermarking. dgMARK steers the unmasking order toward positions whose high-reward candidate tokens satisfy a simple parity constraint induced by a binary hash, without explicitly reweighting the model’s learned probabilities. The method is plug-and-play with common decoding strategies (e.g., confidence, entropy, and margin-based ordering) and can be strengthened with a one-step lookahead variant. Watermarks are detected via elevated parity-matching statistics, and a sliding-window detector ensures robustness under post-editing operations including insertion, deletion, substitution, and paraphrasing.


💡 Research Summary

The paper introduces dgMARK, a novel watermarking technique designed specifically for discrete diffusion language models (dLLMs). Unlike traditional watermarking methods for autoregressive language models, which bias token probabilities toward “green” tokens based on a secret key, dgMARK exploits the order‑sensitivity of dLLMs. In diffusion models, generation proceeds by iteratively unmasking tokens, and the model can reveal tokens in any order. While an ideal model would be order‑invariant, practical dLLMs show measurable differences depending on the unmasking sequence. dgMARK turns this sensitivity into a watermarking channel without altering the learned probability distribution.

The method works as follows. A secret key ξ defines a deterministic binary hash function f(v, ξ) that maps each vocabulary token v to 0 or 1. For each position i in the sequence, the vocabulary is split into a parity‑matching set G_i (tokens where f(v, ξ) ≡ i mod 2) and its complement R_i. During decoding, any standard decoding strategy F (confidence‑based, entropy‑based, margin‑based, etc.) provides for each still‑masked position j a candidate token v_j and a reward r_j (e.g., the model’s confidence, negative entropy, or top‑1/top‑2 gap). dgMARK then restricts attention to positions whose candidate token lies in G_j. Among those, it selects the position with the highest reward and unmasks it. If no such position exists, it falls back to the ordinary set of all positions. This procedure is summarized in Algorithm 2 and is completely plug‑and‑play: it can be wrapped around any existing decoding policy without re‑weighting the model’s probability distribution.

To strengthen the signal, the authors propose a one‑step lookahead variant. Instead of greedily picking the single best parity‑matching position, they collect the top‑k candidates by reward, simulate a one‑step update for each, and count how many parity‑matching opportunities would remain in the next step (function g(j)). The candidate with the largest g(j) is chosen, breaking ties by reward. When k = 1 the method reduces to the greedy baseline; larger k improves embedding strength at the cost of extra computation.

Detection is performed by measuring the proportion of tokens that satisfy the parity condition in the generated text. Random text yields an expected rate of 0.5, whereas dgMARK‑watermarked text exhibits a statistically significant increase. To handle post‑generation edits (insertions, deletions, substitutions, paraphrasing), the authors employ a sliding‑window detector that computes the parity‑matching statistic over overlapping windows, preserving detectability despite alignment shifts.

Empirical evaluation uses state‑of‑the‑art dLLMs such as LLaDA and Dream across multiple benchmark datasets. The authors compare dgMARK against several baselines: (i) traditional autoregressive watermarking (green/red token bias), (ii) recent diffusion‑specific watermarks that still bias token probabilities, and (iii) a no‑watermark control. Quality metrics (BLEU, ROUGE, perplexity) show negligible degradation relative to the no‑watermark baseline, confirming that the method is essentially distortion‑free. Detection accuracy exceeds 95 % under clean conditions and remains above 90 % after aggressive random editing (up to 30 % token insertions/deletions/substitutions). The one‑step lookahead variant further improves detection rates with modest computational overhead.

The paper’s contributions are threefold: (1) it demonstrates empirically that decoding order in dLLMs is a viable watermarking channel, opening a new dimension of model‑level provenance; (2) it provides a non‑invasive watermark that preserves generation quality while offering strong statistical guarantees; (3) it delivers a modular framework compatible with any decoding strategy, facilitating easy integration into existing pipelines.

Future work suggested includes multi‑key schemes and key rotation for enhanced security, extending the approach to other diffusion‑generated modalities (images, audio), and developing defenses against adaptive adversaries who might attempt to erase the watermark by targeted editing or optimization. Overall, dgMARK represents a significant step toward robust, low‑impact provenance tracking for the emerging class of diffusion‑based generative language models.


Comments & Academic Discussion

Loading comments...

Leave a Comment