Towards Effective, Stealthy, and Persistent Backdoor Attacks Targeting Graph Foundation Models
📝 Abstract
Graph Foundation Models (GFMs) are pre-trained on diverse source domains and adapted to unseen targets, enabling broad generalization for graph machine learning. Despite that GFMs have attracted considerable attention recently, their vulnerability to backdoor attacks remains largely underexplored. A compromised GFM can introduce backdoor behaviors into downstream applications, posing serious security risks. However, launching backdoor attacks against GFMs is non-trivial due to three key challenges. (1) Effectiveness: Attackers lack knowledge of the downstream task during pre-training, complicating the assurance that triggers reliably induce misclassifications into desired classes. (2) Stealthiness: The variability in node features across domains complicates trigger insertion that remains stealthy. (3) Persistence: Downstream fine-tuning may erase backdoor behaviors by updating model parameters. To address these challenges, we propose GFM-BA, a novel Backdoor Attack model against Graph Foundation Models. Specifically, we first design a label-free trigger association module that links the trigger to a set of prototype embeddings, eliminating the need for knowledge about downstream tasks to perform backdoor injection. Then, we introduce a node-adaptive trigger generator, dynamically producing node-specific triggers, reducing the risk of trigger detection while reliably activating the backdoor. Lastly, we develop a persistent backdoor anchoring module that firmly anchors the backdoor to fine-tuning-insensitive parameters, enhancing the persistence of the backdoor under downstream adaptation. Extensive experiments demonstrate the effectiveness, stealthiness, and persistence of GFM-BA.
💡 Analysis
Graph Foundation Models (GFMs) are pre-trained on diverse source domains and adapted to unseen targets, enabling broad generalization for graph machine learning. Despite that GFMs have attracted considerable attention recently, their vulnerability to backdoor attacks remains largely underexplored. A compromised GFM can introduce backdoor behaviors into downstream applications, posing serious security risks. However, launching backdoor attacks against GFMs is non-trivial due to three key challenges. (1) Effectiveness: Attackers lack knowledge of the downstream task during pre-training, complicating the assurance that triggers reliably induce misclassifications into desired classes. (2) Stealthiness: The variability in node features across domains complicates trigger insertion that remains stealthy. (3) Persistence: Downstream fine-tuning may erase backdoor behaviors by updating model parameters. To address these challenges, we propose GFM-BA, a novel Backdoor Attack model against Graph Foundation Models. Specifically, we first design a label-free trigger association module that links the trigger to a set of prototype embeddings, eliminating the need for knowledge about downstream tasks to perform backdoor injection. Then, we introduce a node-adaptive trigger generator, dynamically producing node-specific triggers, reducing the risk of trigger detection while reliably activating the backdoor. Lastly, we develop a persistent backdoor anchoring module that firmly anchors the backdoor to fine-tuning-insensitive parameters, enhancing the persistence of the backdoor under downstream adaptation. Extensive experiments demonstrate the effectiveness, stealthiness, and persistence of GFM-BA.
📄 Content
Graph Foundation Models (GFMs) are designed to be pretrained on various graph data from diverse domains, and subsequently adapted to a wide range of downstream tasks in the target domain (Liu et al. 2025;Mao et al. 2024;Shi et al. 2024a,b). Existing efforts (Zhao et al. 2024;Yu et al. 2024Yu et al. , 2025;;Wang et al. 2024b) towards GFMs have demonstrated strong knowledge transfer from pre-training source domains to target domains, achieving superior performance. While the pre-training and adaptation paradigm (Zi et al. 2024;Tang et al. 2024a;He and Hooi 2024;Lachi et al. 2024) has driven the success of GFMs, they also introduce new potential security vulnerabilities, particularly backdoor attacks (i.e., inserting backdoors into the model that cause it to misbehave when encountering certain triggers). For GFMs, attackers can exploit the pre-training stage to inject backdoors and release compromised pre-trained GNNs to the public. Downstream users who adopt these pre-trained models unknowingly inherit the backdoor, exposing their downstream applications to targeted manipulation. These threats pose risks to critical applications of GFMs such as drug discovery (Bongini, Bianchini, and Scarselli 2021) and financial fraud detection (Cheng et al. 2020).
Backdoor attacks for traditional GNNs have been extensively studied (Zhang et al. 2021;Xi et al. 2021;Dai et al. 2023;Zheng et al. 2023;Xu, Xue, and Picek 2021). However, backdoor attacks against traditional GNNs and GFMs have fundamental differences. As shown in Figure 1, existing backdoor attacks against GNNs operate under three presumed conditions where (1) labels for downstream tasks are accessible during the backdoor injection phase; (2) the training and downstream graphs originate from the same domain; (3) the backdoor model remains unchanged during the downstream usage. In the context of GFMs, all three conditions may not hold, leading to three key challenges for designing backdoor attacks against GFMs: (1) Effectiveness: During the pre-training stage, downstream task knowledge is inaccessible. How to ensure that the injected trigger consistently induces a specific label that aligns with the attacker’s intent? (2) Stealthiness: The distribution and semantics of node features can vary significantly across different graph domains (Mao et al. 2024;Shi et al. 2024a). How to design triggers that remain stealthy across diverse downstream domains? (3) Persistence: Downstream adaptation may modify the learned model parameters, thereby erasing the backdoor effect (a phenomenon known as backdoor forgetting (Gu et al. 2023)). How can attackers embed backdoors that can remain persistently effective after downstream fine-tuning?
To address the aforementioned challenges, we propose GFM-BA, a novel model for performing Backdoor Attacks against Graph Foundation Models. First, to solve the effectiveness challenge and achieve label-specific manipulation without access to downstream knowledge, we design a labelfree trigger association module, which links triggers to a
Training (Backdoor Injection) Downstream (Backdoor Activation) Our main contributions can be summarized as follows:
• We study backdoor attacks against Graph Foundation Models, highlighting the significant trustworthiness concerns in the development of GFMs. • We propose GFM-BA, a novel backdoor attack model for GFMs containing three tailored modules targeting the effectiveness, stealthiness, and persistence challenges. • We conduct extensive experiments and show that GFM-BA consistently outperforms existing methods against three representative victim GFMs, demonstrating its superior performance.
Graph Foundation Models. Graph Foundation Models (GFMs) aim to capture generalizable graph knowledge that enables positive transfer across tasks and domains (Liu et al. 2025;Mao et al. 2024;Shi et al. 2024a;Wang et al. 2024b;Zi et al. 2024;Tang et al. 2024a;He and Hooi 2024;Lachi et al. 2024;Xia, Kao, and Huang 2024). They are typically pre-trained using self-supervised objectives, such as link prediction (Yu et al. 2024;Zhang and Chen 2018) or graph contrastive learning (Zhao et al. 2024;Yu et al. 2025), over multiple source datasets. The resulting model is then adapted to downstream tasks on target graphs through task-specific fine-tuning (Zhao et al. 2024;Hassani 2022;You et al. 2020) or prompting (Sun et al. 2022a(Sun et al. , 2023(Sun et al. , 2022b;;Fang et al. 2023;Tang et al. 2024b). For example, GCOPE (Zhao et al. 2024) attempts to mitigate the negative transfer by introducing domain-specific virtual nodes that interconnect nodes across domains, aligning the semantic patterns. MDGPT (Yu et al. 2024) introduces a twostage prompting strategy to adapt target domains by integrating unified multi-domain knowledge with domain-specific information. SAMGPT (Yu et al. 2025) further introduces structure tokens to align various structural distributions. However, the trustworthiness of GFMs remains largely unexplored
This content is AI-processed based on ArXiv data.