OD-Stega: LLM-Based Relatively Secure Steganography via Optimized Distributions
We consider coverless steganography where a Large Language Model (LLM) is used to generate stego-texts in combination with arithmetic coding. An efficient method should embed secret bits in as few language tokens as possible while keeping the stego-text as natural as possible. We show that this problem is equivalent to maximizing the entropy of a replacement probability distribution of the next token generation, subject to a constraint on the divergence between the new distribution and the original one produced by the LLM. A closed-form solution is provided under either the KL divergence or the total variation constraint. Several important practical issues are also tackled: 1) An often-overlooked tokenization mismatch issue is resolved with a simple prompt selection approach, 2) The combination of the optimized distribution and the vocabulary truncation technique is considered, and 3) The incorporation of the proposed approach with existing (potentially non arithmetic coding based) techniques, e.g., the Discop technique.
💡 Research Summary
The paper introduces OD‑Stega, a novel framework for coverless steganography that leverages large language models (LLMs) to embed secret bits directly into generated text. Traditional steganography modifies an existing cover text, but LLM‑based approaches can produce fluent, human‑like text from scratch, offering higher capacity and better stealth. However, when an LLM’s next‑token distribution is highly peaked, the amount of information that can be hidden per token is limited.
OD‑Stega addresses this limitation by formulating the embedding problem as an entropy‑maximization under a divergence constraint. For each generation step i, the original LLM probability distribution Pᵢ over the vocabulary is replaced by a new distribution Qᵢ. The objective is to maximize the Shannon entropy H(Qᵢ) (which directly translates into the expected number of secret bits that can be encoded) while ensuring that the distance between Qᵢ and Pᵢ does not exceed a pre‑specified threshold δ. Two distance measures are considered: Kullback‑Leibler (KL) divergence and total variation (TV) distance. Because the constraints are convex, the optimization admits a global solution.
When KL divergence is used, the optimal Qᵢ has a closed‑form that corresponds to temperature scaling of the original distribution:
Qᵢⱼ = Pᵢⱼ^{1/T} / Σₖ Pᵢₖ^{1/T}, T ≥ 1
The temperature T is chosen such that KL(Qᵢ‖Pᵢ) = δ. As T increases, the distribution becomes more uniform, raising entropy but also increasing divergence; a simple bisection search finds the appropriate T. This result provides a theoretical justification for the empirical practice of raising temperature to obtain more “creative” text, now seen as an optimal strategy under a KL‑bounded security budget.
When TV distance is the constraint, the optimal solution follows a “water‑filling” scheme. Low‑probability tokens are raised to a common value α, while high‑probability tokens are lowered to a common value β, with the total adjustment split evenly to satisfy the TV budget δ. If δ is large enough, α = β = 1/N, yielding a uniform distribution. The paper supplies explicit formulas for α and β and visualizes the adjustment process.
Beyond the core theory, the authors tackle three practical challenges.
-
Tokenizer Mismatch – LLM tokenizers guarantee that detokenizing a token sequence reproduces the original text, but the reverse is not unique; the same text can be tokenized differently on the sender and receiver sides. To mitigate this, the sender (Alice) prefixes the secret bitstream S with a short “verification” block B. She enumerates all possible B values, generates the corresponding stego‑text, and checks whether the receiver (Bob) can correctly decode the full sequence. Bob discards the leading B bits after decoding. This pre‑verification eliminates decoding errors caused by tokenization ambiguity.
-
Vocabulary Truncation – Computing full‑vocab probability vectors is costly for modern LLMs with vocabularies of tens of thousands of tokens. The paper adopts a truncation strategy: only the top‑k tokens (by Pᵢ) are retained, the rest are set to zero, and the optimization is performed on this reduced set. The authors analytically bound the additional KL divergence introduced by truncation and empirically show that modest k (e.g., 500–1000) preserves most of the embedding capacity while dramatically reducing runtime.
-
Integration with Existing Schemes – OD‑Stega is designed to be modular. The authors demonstrate its incorporation into Discop, a recent non‑arithmetic‑coding steganographic method. By feeding the optimized Qᵢ into Discop’s encoding pipeline, they achieve higher embedding rates without sacrificing the original method’s robustness.
Experimental evaluation spans several LLMs (GPT‑2, LLaMA) and diverse text domains. The results confirm that, for a given security budget δ, OD‑Stega consistently outperforms baseline coverless steganography (e.g., vanilla arithmetic‑coding with the raw LLM distribution) by 15–30 % more bits per token. Human evaluation and automated steganalysis (using BERT‑based detectors) show negligible degradation in naturalness; detection rates remain close to random guessing when δ is small (near‑perfect security) and increase only modestly as δ grows, reflecting the intentional trade‑off between capacity and detectability.
In summary, OD‑Stega provides a principled, mathematically grounded method for “relatively secure” steganography: instead of insisting on perfect indistinguishability, it allows the system designer to calibrate how much deviation from the LLM’s native distribution is acceptable given an adversary’s limited detection capability. By solving a convex entropy‑maximization under KL or TV constraints, the framework yields closed‑form optimal distributions, offers practical solutions for tokenizer mismatches and large vocabularies, and integrates seamlessly with existing steganographic pipelines. This work opens a clear path toward more efficient, adaptable, and theoretically justified LLM‑based covert communication.
Comments & Academic Discussion
Loading comments...
Leave a Comment