The Canary's Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text
How much information about training samples can be leaked through synthetic data generated by Large Language Models (LLMs)? Overlooking the subtleties of information flow in synthetic data generation pipelines can lead to a false sense of privacy. In this paper, we assume an adversary has access to some synthetic data generated by a LLM. We design membership inference attacks (MIAs) that target the training data used to fine-tune the LLM that is then used to synthesize data. The significant performance of our MIA shows that synthetic data leak information about the training data. Further, we find that canaries crafted for model-based MIAs are sub-optimal for privacy auditing when only synthetic data is released. Such out-of-distribution canaries have limited influence on the model’s output when prompted to generate useful, in-distribution synthetic data, which drastically reduces their effectiveness. To tackle this problem, we leverage the mechanics of auto-regressive models to design canaries with an in-distribution prefix and a high-perplexity suffix that leave detectable traces in synthetic data. This enhances the power of data-based MIAs and provides a better assessment of the privacy risks of releasing synthetic data generated by LLMs.
💡 Research Summary
The paper investigates how much private information can be inferred from synthetic text generated by large language models (LLMs) that have been fine‑tuned on a private dataset. While prior work on privacy auditing has focused on “model‑based” membership inference attacks (MIAs) that assume the adversary can query the target model and observe its logits, this study addresses the more realistic scenario where only the synthetic data are released. The authors design “data‑based” MIAs that operate solely on the synthetic dataset and evaluate them on three benchmark text classification corpora (SST‑2, AG News, SNLI).
Two families of membership signals are proposed. The first fits an n‑gram language model to the synthetic data and uses the probability it assigns to a candidate “canary” sentence as the signal. The second computes similarity between the canary and each synthetic record, using either Jaccard string similarity or cosine similarity of pretrained embeddings, and aggregates the top‑k similarities. Both signals are fed into the RMIA (pairwise likelihood‑ratio) framework, which also incorporates a small number of reference (shadow) models to improve discrimination.
Experiments show that data‑based MIAs achieve AUC scores of 0.74 (SST‑2), 0.68 (AG News), and 0.77 (SNLI), far above random guessing and demonstrating that synthetic text can leak substantial membership information. The authors then examine the effectiveness of traditional “high‑perplexity” canaries—out‑of‑distribution sequences that are known to be highly memorized by models. While such canaries improve model‑based attacks, they are largely invisible in the synthetic data because the LLM, when prompted for in‑distribution content, tends to avoid generating out‑of‑distribution tokens. Consequently, as canary perplexity increases, the success of data‑based MIAs drops sharply.
To overcome this limitation, the paper introduces a new canary design that combines an in‑distribution prefix with a high‑perplexity suffix. The prefix aligns with the domain‑specific prompt, ensuring the model is likely to reproduce it, while the suffix forces the model to memorize a difficult subsequence. By varying the length of the prefix (parameter F), the authors find that intermediate values (0 < F < max) consistently outperform both fully in‑distribution (F = max) and fully out‑of‑distribution (F = 0) canaries across all datasets. This hybrid design yields a noticeable increase (≈5‑10 % absolute AUC) in data‑based attack performance.
Finally, the authors evaluate the attacks against a differentially private training pipeline that uses DP‑SGD with ε = 8. Under this regime, the strongest data‑based MIA collapses to random performance (AUC ≈ 0.5), confirming that differential privacy provides a robust defense against both model‑ and data‑based membership inference.
Overall, the contributions are threefold: (1) introducing effective data‑only MIAs that expose privacy risks of released synthetic text; (2) demonstrating that conventional high‑perplexity canaries are suboptimal for auditing synthetic data and proposing a more suitable hybrid canary; (3) showing that differential privacy effectively mitigates these risks. The work offers a practical toolkit for auditors and regulators to obtain lower bounds on privacy leakage when synthetic text is shared, complementing existing upper‑bound analyses based on formal privacy guarantees. Future directions include scaling the methodology to larger LLMs, exploring multi‑turn prompting, and integrating continuous monitoring of privacy leakage in production pipelines.
Comments & Academic Discussion
Loading comments...
Leave a Comment