Holmes: Towards Effective and Harmless Model Ownership Verification to Personalized Large Vision Models via Decoupling Common Features
Large vision models (LVMs) achieve remarkable performance in various downstream tasks, primarily by personalizing pre-trained models through fine-tuning with private and valuable local data, which makes the personalized model a valuable intellectual property. Similar to the era of traditional DNNs, model stealing attacks also pose significant risks to LVMs. However, this paper reveals that most existing defense methods (developed for traditional DNNs), typically designed for models trained from scratch, either introduce additional security risks, are prone to misjudgment, or are even ineffective for fine-tuned models. To alleviate these problems, this paper proposes a harmless model ownership verification method for personalized LVMs by decoupling similar common features. In general, our method consists of three main stages. In the first stage, we create shadow models that retain common features of the victim model while disrupting dataset-specific features. We represent the dataset-specific features of the victim model by computing the output differences between the shadow and victim models, without altering the victim model or its training process. After that, a meta-classifier is trained to identify stolen models by determining whether suspicious models contain the dataset-specific features of the victim. In the third stage, we conduct model ownership verification by hypothesis test to mitigate randomness and enhance robustness. Extensive experiments on benchmark datasets verify the effectiveness of the proposed method in detecting different types of model stealing simultaneously. Our codes are available at https://github.com/zlh-thu/Holmes.
💡 Research Summary
The paper addresses the pressing problem of protecting the intellectual property of personalized large vision models (LVMs) against model‑stealing attacks. While many existing model‑ownership verification (MOV) techniques—watermarking and fingerprinting—have been proposed for models trained from scratch, they either compromise model reliability (by embedding malicious backdoors) or suffer from high false‑positive rates when the victim and suspect models share common features, especially in the fine‑tuned LVM scenario where a powerful foundation model already encodes abundant generic representations.
Holmes introduces a novel, intrinsically harmless MOV framework that explicitly decouples “common features” (those learned by the foundation model and shared across many downstream tasks) from “dataset‑specific features” (the unique patterns encoded during fine‑tuning on private data). The method proceeds in three stages.
-
Shadow Model Construction – Two shadow models are derived from the same foundation model without modifying the victim model.
- Poisoned Shadow: The most confidently learned personalization samples (lowest loss) are poisoned with a label‑inconsistent backdoor (e.g., BadNets). This disrupts the victim’s dataset‑specific features while preserving common features, because the poisoned model still performs well on benign test data.
- Benign Shadow: The foundation model is fine‑tuned on the personalization set excluding the well‑learned samples, yielding a model that shares the victim’s common features but possesses a different set of dataset‑specific features.
-
Meta‑Classifier Training – Two output‑difference vectors are computed: (i) the difference between the victim and poisoned shadow (captures victim‑specific features) and (ii) the difference between benign and poisoned shadows (captures only common features). These vectors serve as inputs to a binary meta‑classifier that learns to distinguish whether a suspect model contains the victim’s dataset‑specific imprint while being robust to models that merely share common representations.
-
Ownership Verification via Hypothesis Testing – The meta‑classifier’s score on a suspect model is subjected to a statistical hypothesis test (null hypothesis: the model lacks the victim’s dataset‑specific features). By aggregating scores over multiple queries, the test mitigates randomness and yields a p‑value; a value below a predefined threshold confirms ownership.
The authors evaluate Holmes on several benchmark datasets (CIFAR‑10, ImageNet‑subsets, medical imaging) and on state‑of‑the‑art LVMs such as CLIP, BEiT, and ViT. They consider six stealing scenarios: direct parameter copying, fine‑tuning, knowledge distillation with/without training data, data‑free distillation, and advanced distilled fine‑tuning (DFT). Compared with the latest harmless watermarking method MOVE, conventional fingerprinting, and various active defenses (output rounding, noise injection, query limiting), Holmes consistently achieves higher detection accuracy (≈92 %) and markedly lower false‑positive rates (≈1.8 %). Importantly, because the victim model is never altered, its original performance remains unchanged—a stark contrast to watermarking approaches that degrade accuracy by 3–5 %.
Key contributions include:
- A new fingerprinting paradigm that suppresses interference from shared common features, thereby reducing mis‑judgment.
- A fully non‑invasive verification pipeline that leverages shadow models to extract dataset‑specific signals without any modification to the released victim model.
- Integration of a meta‑classifier with hypothesis testing to provide statistically sound ownership claims across diverse stealing attacks.
- Extensive empirical validation and open‑source release, facilitating reproducibility and future extensions to generative vision tasks (e.g., image captioning).
Limitations are acknowledged: the current study focuses on classification; extending to detection, segmentation, or multimodal tasks may require tailored shadow‑model designs. The reliance on backdoor‑style poisoning for the poisoned shadow could be challenged by advanced backdoor‑removal techniques, and the choice of significance level in hypothesis testing may need domain‑specific tuning.
In conclusion, Holmes offers a practical, effective, and harmless solution for ownership verification of personalized large vision models, bridging the gap left by prior methods and setting a solid foundation for future research on IP protection in the era of foundation‑model‑driven AI.
Comments & Academic Discussion
Loading comments...
Leave a Comment