SELF: A Robust Singular Value and Eigenvalue Approach for LLM Fingerprinting

Reading time: 4 minute
...

📝 Original Info

  • Title: SELF: A Robust Singular Value and Eigenvalue Approach for LLM Fingerprinting
  • ArXiv ID: 2512.03620
  • Date: 2025-12-03
  • Authors: Hanxiu Zhang, Yue Zheng

📝 Abstract

The protection of Intellectual Property (IP) in Large Language Models (LLMs) represents a critical challenge in contemporary AI research. While fingerprinting techniques have emerged as a fundamental mechanism for detecting unauthorized model usage, existing methods -- whether behavior-based or structural -- suffer from vulnerabilities such as false claim attacks or susceptible to weight manipulations. To overcome these limitations, we propose SELF, a novel intrinsic weight-based fingerprinting scheme that eliminates dependency on input and inherently resists false claims. SELF achieves robust IP protection through two key innovations: 1) unique, scalable and transformation-invariant fingerprint extraction via singular value and eigenvalue decomposition of LLM attention weights, and 2) effective neural network-based fingerprint similarity comparison based on few-shot learning and data augmentation. Experimental results demonstrate SELF maintains high IP infringement detection accuracy while showing strong robustness against various downstream modifications, including quantization, pruning, and fine-tuning attacks. Our code is available at https://github.com/HanxiuZhang/SELF_v2.

💡 Deep Analysis

Figure 1

📄 Full Content

SELF: A ROBUST SINGULAR VALUE AND EIGENVALUE APPROACH FOR LLM FINGERPRINTING Hanxiu Zhang, Yue Zheng∗ The Chinese University of Hong Kong, Shenzhen hanxiuzhang@link.cuhk.edu.cn, zhengyue@cuhk.edu.cn ABSTRACT The protection of Intellectual Property (IP) in Large Language Models (LLMs) represents a critical challenge in contemporary AI research. While fingerprinting techniques have emerged as a fundamen- tal mechanism for detecting unauthorized model usage, existing methods—whether behavior-based or structural–suffer from vulnerabilities such as false claim attacks or susceptible to weight manipula- tions. To overcome these limitations, we propose SELF, a novel intrinsic weight-based fingerprinting scheme that eliminates dependency on input and inherently resists false claims. SELF achieves robust IP protection through two key innovations: 1) unique, scalable and transformation-invariant fingerprint extraction via singular value and eigenvalue decomposition of LLM attention weights, and 2) effective neural network-based fingerprint similarity comparison based on few-shot learning and data augmen- tation. Experimental results demonstrate SELF maintains high IP infringement detection accuracy while showing strong robustness against various downstream modifications, including quantization, pruning, and fine-tuning attacks. Our code is available at github.com/HanxiuZhang/SELF_v2. Keywords Large Language Model · Intellectual Property Protection · Fingerprinting · Singular Values · Eigenvalues 1 Introduction Large language models (LLMs) are increasingly being adopted as versatile tools to enhance productivity in various fields, including medical assistance ([1]), code generation ([2]), and so on. Developing a functional LLM requires substantial investments, including high-quality datasets, significant computational resources, and specialized human expertise. Consequently, protecting the intellectual property (IP) of LLMs is of paramount importance ([3]), particularly in the current era where open-source trends clash with the need for model creators to maintain naming conventions for attribution on derivative works. Current model IP infringement detection methods primarily fall into two categories: watermarking and fingerprinting. Watermarking approaches embed identifiable features (watermarks) invasively into target models while trying to preserve their original functionality ([4, 5]). In contrast, fingerprinting methods extract unique model identifiers without modifying the model, either by analyzing the model’s input-output behavioral patterns ([6]) (i.e., behavior fingerprinting) or structural information (i.e., structural fingerprinting) such as weight distributions ([7]), intermediate representations ([8]), or gradient profiles ([9]). Compared to watermarking-based methods, fingerprinting schemes eliminate the need of retraining and avoid potential performance degradation associated with watermark insertion ([10]). Despite these advantages, existing fingerprinting methods face critical limitations. Behavior-based techniques are vulnerable to false claim attacks ([11]), wherein malicious actors can falsely claim the ownership of independently trained models by crafting (transferable) adversarial samples. Although [12] propose to mitigate the attack by constructing fingerprints using targeted adversarial examples, the risk persists as such adversarial examples can still be transferrable albeit with greater difficulty. Structural approaches analyze model internal parameters but lack robustness against weight manipulations such as permutation or linear mapping. For schemes like HuRef ([7]) where the input is required to actively participate in fingerprint computation, we further extend the scope of false claim attack as malicious accuser can manipulate ownership verification results through carefully crafted input. Under this broader definition, we ∗Corresponding author arXiv:2512.03620v1 [cs.CR] 3 Dec 2025 conducted false claim attack on HuRef scheme and successfully manipulated the similarity score output (see Appendix B). To address these issues, we propose a structural fingerprinting method named SELF, which purely depends on the model weights. Figure 1 describes SELF’s pipeline. The owner first extracts a fingerprint from the target model and trains a Similarity Network (SimNet) for verification. If the model is stolen, the owner can detect piracy by SimNet’s high similarity output. SELF comprises two key components: (1) Fingerprint Extraction, which derives unique, robust and scalable fingerprints from model weights; and (2) Similarity Computation, where a neural network learns fingerprint patterns to enable robust and efficient similarity assessment. Figure 1: IP infringement detection pipeline using SELF. In the fingerprint extraction module, we address potential model weight tampering caused by transformation attacks (e.g., permutation and linear-mapping ([7])) through identifying invariant attributes.

📸 Image Gallery

dist_ft_llama2.png dist_ft_mix_llama2.png dist_pr_llama2.png dist_sim_ppl_pure.png dist_sim_ppl_slicegpt.png dist_sim_ppl_wiki.png dist_sim_pure.png dist_sim_wiki.png false_claim_huref-DESKTOP-HANCY.png false_claim_huref.png fc.png fc_result.png introduction.png methodology.png methodology_1119.png related_huref_fc.png sim_dist_pure.png sim_dist_slicegpt.png sim_dist_wiki.png sim_ft_llama2.png sim_ft_llama2_mix_alpaca.png sim_ft_mix_llama2.png sim_ppl_pure.png sim_ppl_slicegpt.png sim_ppl_wiki.png sim_pr_llama2.png unrelated_huref_fc.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut