FNF: Functional Network Fingerprint for Large Language Models
The development of large language models (LLMs) is costly and has significant commercial value. Consequently, preventing unauthorized appropriation of open-source LLMs and protecting developers’ intellectual property rights have become critical challenges. In this work, we propose the Functional Network Fingerprint (FNF), a training-free, sample-efficient method for detecting whether a suspect LLM is derived from a victim model, based on the consistency between their functional network activity. We demonstrate that models that share a common origin, even with differences in scale or architecture, exhibit highly consistent patterns of neuronal activity within their functional networks across diverse input samples. In contrast, models trained independently on distinct data or with different objectives fail to preserve such activity alignment. Unlike conventional approaches, our method requires only a few samples for verification, preserves model utility, and remains robust to common model modifications (such as fine-tuning, pruning, and parameter permutation), as well as to comparisons across diverse architectures and dimensionalities. FNF thus provides model owners and third parties with a simple, non-invasive, and effective tool for protecting LLM intellectual property. The code is available at https://github.com/WhatAboutMyStar/LLM_ACTIVATION.
💡 Research Summary
The paper introduces Functional Network Fingerprint (FNF), a training‑free, sample‑efficient technique for determining whether a suspect large language model (LLM) is derived from a particular “victim” model. Unlike conventional watermarking or parameter‑similarity approaches, FNF relies on the intrinsic functional activity of the networks themselves, making it non‑invasive and robust to common post‑hoc modifications such as fine‑tuning, pruning, or parameter permutation.
The authors draw an analogy between functional brain networks (FBNs) and the co‑activation patterns observed inside transformer‑based LLMs. For a given set of input sequences, they extract the hidden‑state activations from each transformer block (a token‑by‑token matrix of size tokens × hidden‑dim). These activation matrices from multiple inputs are concatenated and fed into a spatial independent component analysis pipeline (CanICA). After PCA whitening, FastICA decomposes the data into K independent spatial components, each representing a putative functional network—a set of neurons that tend to co‑activate across tasks.
Each component is thresholded to produce a binary mask selecting the most salient neurons. For every input sample, the mean activation of the masked neurons yields a one‑dimensional time series (a “functional time course”) for each network. To compare two models A and B, the authors compute Spearman rank correlations between every pair of time courses (i from A, j from B) across all N input samples, then average these correlations to obtain a K × K similarity matrix ¯R. High average correlations, especially along the diagonal, indicate that the two models share a common functional organization and are therefore likely to belong to the same lineage. The use of Spearman rather than Pearson correlation emphasizes trend consistency over absolute magnitude, providing robustness against monotonic transformations caused by scaling or re‑parameterization.
Experimental evaluation uses 10–20 WikiText‑2 sentences as stimuli and examines a broad spectrum of models: different scales within the same family (e.g., Qwen‑3B vs. Qwen‑7B), different families (e.g., LLaMA vs. GPT‑Neo), and models that have undergone structural expansions such as added layers or enlarged MLP dimensions (“weight repackaging”). Results show strong cross‑sample functional alignment for models sharing a training lineage, even when their architectures or hidden dimensions differ, while unrelated models exhibit low alignment. Moreover, the method remains effective after fine‑tuning, pruning, and random permutation of parameters, confirming its resilience to typical obfuscation tactics.
The paper also discusses limitations. ICA assumes linear mixing of sources, whereas LLM activations are generated by highly non‑linear transformations; thus, some functional relationships may be missed. The choice of K (number of components) and the threshold for binary masks are hyper‑parameters that can influence performance and were fixed in the study rather than tuned per model. Additionally, the method’s discriminative power for models that share architecture but differ only in training data or hyper‑parameters remains to be fully explored.
In summary, FNF offers a novel, dynamic‑based fingerprint that captures the “functional DNA” of LLMs. By focusing on co‑activation patterns rather than static weights, it provides a non‑invasive, sample‑efficient, and architecture‑agnostic tool for intellectual‑property protection and model provenance verification. Future work could extend the approach with non‑linear dimensionality‑reduction techniques, adaptive component selection, or multi‑modal stimuli to further enhance robustness and applicability across the rapidly expanding LLM ecosystem.
Comments & Academic Discussion
Loading comments...
Leave a Comment