Leveraging Soft Prompts for Privacy Attacks in Federated Prompt Tuning
Membership inference attack (MIA) poses a significant privacy threat in federated learning (FL) as it allows adversaries to determine whether a client’s private dataset contains a specific data sample. While defenses against membership inference attacks in standard FL have been well studied, the recent shift toward federated fine-tuning has introduced new, largely unexplored attack surfaces. To highlight this vulnerability in the emerging FL paradigm, we demonstrate that federated prompt-tuning, which adapts pre-trained models with small input prefixes to improve efficiency, also exposes a new vector for privacy attacks. We propose PromptMIA, a membership inference attack tailored to federated prompt-tuning, in which a malicious server can insert adversarially crafted prompts and monitors their updates during collaborative training to accurately determine whether a target data point is in a client’s private dataset. We formalize this threat as a security game and empirically show that PromptMIA consistently attains high advantage in this game across diverse benchmark datasets. Our theoretical analysis further establishes a lower bound on the attack’s advantage which explains and supports the consistently high advantage observed in our empirical results. We also investigate the effectiveness of standard membership inference defenses originally developed for gradient or output based attacks and analyze their interaction with the distinct threat landscape posed by PromptMIA. The results highlight non-trivial challenges for current defenses and offer insights into their limitations, underscoring the need for defense strategies that are specifically tailored to prompt-tuning in federated settings.
💡 Research Summary
The paper introduces PromptMIA, a novel membership inference attack (MIA) that targets federated prompt‑tuning (FPT), a recent federated learning (FL) paradigm where a pre‑trained foundation model is kept frozen and only lightweight soft prompts (key‑prompt pairs) are learned and exchanged. In this setting, clients select the top‑N prompts for each input by matching a query vector q(x) (derived from the input) against stored keys using cosine similarity, then locally update the selected prompts. The authors show that a malicious server can exploit this selection mechanism to infer whether a specific target sample T is present in a client’s private dataset.
PromptMIA works as follows. Before a training round the server computes the query vector q(T) for the target sample and generates N adversarial keys k_adv that have higher cosine similarity to q(T) than any benign key, while ensuring the adversarial keys are sufficiently diverse (controlled by parameters δ_min and Δ). Each adversarial key is paired with a prompt (the prompt content itself can be arbitrary) and injected into the global prompt pool by replacing N existing keys. When the modified pool is broadcast, any client that holds T will inevitably select all N adversarial prompts because they are the most similar to q(T). The client then updates these prompts locally and returns them to the server. By simply monitoring which injected prompts have been updated, the server can decide with high confidence whether T was a member (b = 1) or not (b = 0). This inference requires only a single communication round and does not rely on gradients, model weights, or shadow‑model training, dramatically reducing attack cost and assumptions.
The authors formalize the attack as a security game Exp AMI(A) and define the adversary’s advantage as Adv_AMI(A) = Pr
Comments & Academic Discussion
Loading comments...
Leave a Comment