Traceable Black-box Watermarks for Federated Learning

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Due to the distributed nature of Federated Learning (FL) systems, each local client has access to the global model, which poses a critical risk of model leakage. Existing works have explored injecting watermarks into local models to enable intellectual property protection. However, these methods either focus on non-traceable watermarks or traceable but white-box watermarks. We identify a gap in the literature regarding the formal definition of traceable black-box watermarking and the formulation of the problem of injecting such watermarks into FL systems. In this work, we first formalize the problem of injecting traceable black-box watermarks into FL. Based on the problem, we propose a novel server-side watermarking method, $\mathbf{TraMark}$, which creates a traceable watermarked model for each client, enabling verification of model leakage in black-box settings. To achieve this, $\mathbf{TraMark}$ partitions the model parameter space into two distinct regions: the main task region and the watermarking region. Subsequently, a personalized global model is constructed for each client by aggregating only the main task region while preserving the watermarking region. Each model then learns a unique watermark exclusively within the watermarking region using a distinct watermark dataset before being sent back to the local client. Extensive results across various FL systems demonstrate that $\mathbf{TraMark}$ ensures the traceability of all watermarked models while preserving their main task performance. The code is available at https://github.com/JiiahaoXU/TraMark.

💡 Research Summary

Federated Learning (FL) enables collaborative model training without sharing raw data, but the distribution of the global model to all participants creates a serious risk of model leakage. Existing watermarking approaches either embed signatures directly into model parameters (parameter‑based) which requires white‑box access, or use backdoor triggers (black‑box) that can be verified via API calls. While black‑box methods avoid the need for model parameters, they generally do not provide traceability – the ability to pinpoint which client leaked a model – and many rely on modifying the client’s local training pipeline, making them vulnerable to tampering.

This paper first formalizes the problem of traceable black‑box watermarking in FL. The goal is to inject a distinct watermark into the model delivered to each client, such that (1) the main learning objective is preserved, (2) each watermark can be verified in a black‑box setting, and (3) watermarks are sufficiently different to avoid collisions, guaranteeing traceability. The authors introduce a set of definitions (watermark dataset, black‑box watermark, watermark collision, traceability) and present an optimization problem that jointly minimizes the average FL loss and the watermark loss while enforcing a divergence constraint between any pair of watermarks.

To solve this problem, the authors propose TraMark, a server‑side algorithm that can be plugged into standard FedAvg. The key idea is to partition the model’s parameter space into a main‑task region (M_m) and a watermarking region (M_w). During each communication round, the server performs masked aggregation: it averages client updates only over M_m, leaving M_w untouched (or applying a separate aggregation rule). After aggregation, the server fine‑tunes each client‑specific model in the watermarking region using a dedicated watermark dataset D_w_i. The fine‑tuning uses a small learning rate η_w and is restricted to parameters selected by M_w, ensuring that the main‑task performance is not degraded. Algorithm 1 details the per‑client watermark injection loop, and the full FL workflow with TraMark is given in the appendix.

The authors conduct extensive experiments on image classification (CIFAR‑10, FEMNIST, CelebA) under both IID and non‑IID client distributions, varying the number of clients (10–100) and communication rounds (200–500). Results show that the average drop in main‑task accuracy is only 0.54 % compared with vanilla FedAvg, while watermark verification accuracy exceeds 95 % across all settings. Watermark collisions are observed in less than 1 % of cases, confirming the effectiveness of the distinct D_w_i design and the masked aggregation strategy. Ablation studies explore the impact of the mask size, the number of watermarking iterations τ_w, and the learning rate η_w, revealing a clear trade‑off: larger watermark regions improve traceability but can slightly hurt task performance if too large.

A robustness evaluation simulates malicious clients that discard or corrupt their watermark dataset. Because watermark injection occurs entirely on the server and only the watermarking region is altered, the watermarks remain intact and can still be detected during verification, addressing a major weakness of prior client‑side schemes.

In summary, TraMark delivers the first practical solution for traceable black‑box watermarking in federated learning. By decoupling the main learning parameters from the watermark parameters and performing all watermark operations on the trusted server, it preserves model utility, prevents watermark collisions, and enables reliable attribution of leaked models without requiring white‑box access. The approach is compatible with existing FL pipelines and opens avenues for future work on multimodal FL, asynchronous updates, and continual learning scenarios.

Traceable Black-box Watermarks for Federated Learning

💡 Research Summary

Comments & Academic Discussion

Leave a Comment