Look Twice before You Leap: A Rational Agent Framework for Localized Adversarial Anonymization

Look Twice before You Leap: A Rational Agent Framework for Localized Adversarial Anonymization
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Current LLM-based text anonymization frameworks usually rely on remote API services from powerful LLMs, which creates an inherent privacy paradox: users must disclose data to untrusted third parties for guaranteed privacy preservation. Moreover, directly migrating current solutions to local small-scale models (LSMs) offers a suboptimal solution with severe utility collapse. Our work argues that this failure stems not merely from the capability deficits of LSMs, but significantly from the inherent irrationality of the greedy adversarial strategies employed by current state-of-the-art (SOTA) methods. To address this, we propose Rational Localized Adversarial Anonymization (RLAA), a fully localized and training-free framework featuring an Attacker-Arbitrator-Anonymizer architecture. We model the anonymization process as a trade-off between Marginal Privacy Gain (MPG) and Marginal Utility Cost (MUC), and demonstrate that greedy strategies tend to drift into an irrational state. Instead, RLAA introduces an arbitrator that acts as a rationality gatekeeper, validating the attacker’s inference to filter out feedback providing negligible privacy benefits. This mechanism promotes a rational early-stopping criterion, and structurally prevents utility collapse. Extensive experiments on different benchmarks demonstrate that RLAA achieves a superior privacy-utility trade-off compared to strong baselines.


💡 Research Summary

The paper addresses a critical privacy paradox in current large‑language‑model (LLM) based text anonymization: state‑of‑the‑art methods rely on powerful LLMs accessed through remote APIs, which forces users to transmit raw, sensitive text to untrusted third‑party services. While migrating these adversarial anonymization frameworks to locally run small‑scale models (LSMs) seems attractive, a naïve migration leads to severe utility collapse. The authors argue that this collapse is not merely due to the limited capabilities of LSMs but stems from the irrationality of greedy adversarial strategies that dominate existing pipelines such as Feedback‑guided Adversarial Anonymization (FgAA).

To formalize the problem, the authors model each anonymization step as an economic transaction. They define Marginal Privacy Gain (MPG) as the reduction in an attacker’s inference accuracy after a step, and Marginal Utility Cost (MUC) as the loss in semantic quality. The ratio MRS = MUC/MPG represents the “price” of privacy protection. A rational agent should keep MRS below a pre‑specified threshold λ; greedy agents, however, often let MRS diverge to infinity, causing unnecessary edits that erode utility while providing negligible privacy benefit.

The proposed solution, Rational Localized Adversarial Anonymization (RLAA), is a training‑free, fully local framework that introduces an Attacker‑Arbitrator‑Anonymizer (A‑A‑A) architecture. The attacker is an LSM that attempts to re‑identify personal attributes from the current text. The arbitrator acts as a rationality gatekeeper: it validates the attacker’s inferred leaks using a meta‑reasoning‑inspired verification step. If the inferred leak does not yield a sufficient MPG (i.e., the MRS would exceed λ), the arbitrator discards the feedback, preventing the anonymizer from making a destructive edit. This early‑stopping mechanism enforces rational decision‑making and structurally prevents utility collapse.

RLAA is evaluated on two large Reddit‑derived datasets (PersonalReddit and reddit‑self‑disclosure) using three popular LSMs: Llama‑3‑8B, Qwen2.5‑7B, and DeepSeek‑V3.2‑Exp. Baselines include the original FgAA, SEAL (which distills knowledge from a powerful LLM via supervised fine‑tuning), IncogniText, and several traditional NER‑based methods. Metrics comprise privacy risk (P_atk), utility (BLEU, ROUGE‑L, semantic similarity), and the newly introduced MRS. RLAA consistently achieves lower privacy risk (often <0.12) while maintaining or improving utility scores by 10‑15 % relative to baselines. Moreover, RLAA reduces the average MRS by over 30 %, indicating a more efficient trade‑off between privacy gain and utility loss. In Pareto analyses, RLAA dominates all baselines on the reddit‑self‑disclosure benchmark, confirming its superior balance.

Ablation studies show that removing the arbitrator (i.e., reverting to a purely greedy loop) reproduces the utility collapse observed in naïve LSM migrations, confirming the central role of rational gating. The authors also discuss the sensitivity of the λ threshold and propose future work on adaptive λ tuning, multi‑attacker scenarios, and extensions to non‑text modalities.

In summary, RLAA offers a principled, locally deployable anonymization framework that mitigates the privacy paradox of remote LLM APIs and overcomes the utility degradation of greedy adversarial methods. By embedding economic rationality through an arbitrator, it aligns privacy protection with semantic preservation, delivering a Pareto‑optimal solution across diverse small‑scale language models.


Comments & Academic Discussion

Loading comments...

Leave a Comment