MAGPrompt: Message-Adaptive Graph Prompt Tuning for Graph Neural Networks
Pre-trained graph neural networks (GNNs) transfer well, but adapting them to downstream tasks remains challenging due to mismatches between pre-training objectives and task requirements. Graph prompt tuning offers a parameter-efficient alternative to fine-tuning, yet most methods only modify inputs or representations and leave message passing unchanged, limiting their ability to adapt neighborhood interactions. We propose message-adaptive graph prompt tuning, which injects learnable prompts into the message passing step to reweight incoming neighbor messages and add task-specific prompt vectors during message aggregation, while keeping the backbone GNN frozen. The approach is compatible with common GNN backbones and pre-training strategies, and applicable across downstream settings. Experiments on diverse node- and graph-level datasets show consistent gains over prior graph prompting methods in few-shot settings, while achieving performance competitive with fine-tuning in full-shot regimes.
💡 Research Summary
The paper addresses a key limitation of existing graph prompt tuning methods: they leave the message‑passing mechanism of a pre‑trained Graph Neural Network (GNN) untouched, adapting only inputs, hidden representations, or graph topology. Because the core of GNNs is the aggregation of neighbor messages, a fixed aggregation rule can hinder adaptation when downstream tasks require different neighborhood mixing patterns than those encoded during pre‑training.
To overcome this, the authors propose MAGPrompt (Message‑Adaptive Graph Prompt), a framework that injects lightweight, learnable prompts directly into the message‑passing step while keeping the backbone parameters frozen. MAGPrompt consists of two complementary components:
- Message‑reweighting gate – For each edge (i, j) a scalar gate a₍ᵢⱼ₎ is computed by a lightweight attention‑style module. Node embeddings from the frozen backbone are projected into a low‑dimensional gating space (dimension dₐ), head‑wise attention scores are calculated, soft‑maxed across the neighborhood, and then averaged over heads. A smoothing hyper‑parameter β guarantees that a₍ᵢⱼ₎ stays in
Comments & Academic Discussion
Loading comments...
Leave a Comment