Evolutionary Router Feature Generation for Zero-Shot Graph Anomaly Detection with Mixture-of-Experts
Zero-shot graph anomaly detection (GAD) has attracted increasing attention recent years, yet the heterogeneity of graph structures, features, and anomaly patterns across graphs make existing single GNN methods insufficiently expressive to model diverse anomaly mechanisms. In this regard, Mixture-of-experts (MoE) architectures provide a promising paradigm by integrating diverse GNN experts with complementary inductive biases, yet their effectiveness in zero-shot GAD is severely constrained by distribution shifts, leading to two key routing challenges. First, nodes often carry vastly different semantics across graphs, and straightforwardly performing routing based on their features is prone to generating biased or suboptimal expert assignments. Second, as anomalous graphs often exhibit pronounced distributional discrepancies, existing router designs fall short in capturing domain-invariant routing principles that generalize beyond the training graphs. To address these challenges, we propose a novel MoE framework with evolutionary router feature generation (EvoFG) for zero-shot GAD. To enhance MoE routing, we propose an evolutionary feature generation scheme that iteratively constructs and selects informative structural features via an LLM-based generator and Shapley-guided evaluation. Moreover, a memory-enhanced router with an invariant learning objective is designed to capture transferable routing patterns under distribution shifts. Extensive experiments on six benchmarks show that EvoFG consistently outperforms state-of-the-art baselines, achieving strong and stable zero-shot GAD performance.
💡 Research Summary
Zero‑shot graph anomaly detection (GAD) aims to identify anomalous nodes in previously unseen graphs without any label information from the target domain. Existing zero‑shot GAD methods rely on a single graph neural network (GNN) to extract node embeddings, which limits expressiveness because different graphs may exhibit heterogeneous structural patterns (homophily vs. heterophily), varying attribute semantics, and diverse anomaly mechanisms. To overcome this bottleneck, the authors propose EvoFG, a novel mixture‑of‑experts (MoE) framework equipped with an evolutionary router feature generation pipeline. The core idea is to endow the router—the component that decides which GNN experts to use—with a dynamic, informative, and domain‑invariant feature space. EvoFG proceeds in four tightly coupled stages.
-
Standardized Primitive Features – For each node, a set of basic structural descriptors (e.g., similarity to local neighbors, PageRank, smoothness score, 2‑hop clustering coefficient) is computed. These primitives are graph‑agnostic and serve as the seed for further feature engineering.
-
LLM‑Based Feature Generation – A large language model (LLM) receives the primitive set as a prompt and, via a chain‑of‑thought reasoning process, proposes new composite features such as “average heterophily of the 3‑hop neighborhood” or “log‑scaled betweenness weighted by edge weight”. This step automatically explores a large combinatorial space of structural descriptors that would be impractical to design manually.
-
Shapley‑Guided Feature Selection – Not all generated descriptors are useful for routing. The authors evaluate each candidate’s contribution to router performance using Shapley values, which measure marginal impact across all possible feature subsets. Because exact Shapley computation is exponential, they introduce a sampling‑based approximation together with a lightweight proxy task: the router’s ability to predict expert weighting on a held‑out set. Features whose estimated Shapley value exceeds a threshold are retained, forming the evolving feature set ℱ.
-
Memory‑Enhanced Router with Invariant Learning – The router consumes the current ℱ and outputs routing logits G ∈ ℝ^{N×E}, where E is the number of GNN experts. A softmax over G yields per‑expert weights P, enabling soft aggregation of expert embeddings rather than hard selection, which improves robustness under distribution shift. To prevent the router from over‑fitting to spurious correlations in ℱ, a memory bank stores routing patterns learned in previous iterations; during a new iteration the router can retrieve and adapt these patterns, stabilizing training when ℱ changes dramatically. Moreover, the router is trained with an invariant risk minimization (IRM) objective. Random masks are applied to ℱ to create multiple “environments”; the router is optimized to minimize both the average loss across environments and the variance of gradients across environments, encouraging the discovery of graph‑invariant routing signals.
The expert side of the MoE consists of four diverse GNNs (low‑pass, high‑pass, attention‑based, and hybrid architectures). Because node attributes differ across datasets, the authors first apply PCA to reduce dimensionality, then reorder the resulting components by a smoothness score, yielding a unified attribute matrix ˜X that aligns structural frequencies across domains. Each expert processes ˜X and the adjacency matrix A, producing node embeddings H_e. During training, a subset of normal nodes is treated as in‑context examples; the remaining nodes are queries. Cross‑attention reconstructs query embeddings from in‑context embeddings, and a cosine consistency loss encourages high similarity for normal nodes while penalizing anomalies. The final anomaly score for a node is the ℓ₂ reconstruction error between the aggregated expert embeddings and their reconstructions.
Training proceeds in two stages: (i) experts are pretrained independently with the consistency loss; (ii) the router is trained while freezing expert parameters, using the memory‑enhanced architecture and the invariant loss. This decoupling avoids routing collapse (where one expert dominates) and preserves expert diversity.
Extensive experiments on six public benchmarks—including citation networks (Cora, Citeseer), social networks (Reddit), e‑commerce graphs (Amazon‑Photo), and financial transaction graphs (Bitcoin‑OTC)—demonstrate that EvoFG consistently outperforms state‑of‑the‑art zero‑shot GAD baselines (Pretrained‑GNN, ARC, AnomalyGFM, GraphPrompt, and standard MoE). Average AUC improvements range from 7% to 12% absolute, with the most pronounced gains on heterophilic or high‑dimensional graphs where a single GNN struggles. Ablation studies confirm that each component—LLM‑generated features, Shapley‑based selection, memory‑enhanced routing, and invariant learning—contributes positively to performance.
In summary, EvoFG addresses the fundamental limitation of existing generalist GAD models: the static, insufficient router feature space that hampers transferability across heterogeneous graphs. By iteratively evolving router features through LLM generation, rigorously selecting them via Shapley values, reinforcing the router with a memory mechanism, and enforcing domain‑invariant routing decisions, EvoFG achieves robust, high‑quality routing and superior anomaly detection in zero‑shot settings. The work opens avenues for leveraging large language models in automated graph feature engineering and for designing more resilient MoE systems in graph‑centric security and reliability applications.
Comments & Academic Discussion
Loading comments...
Leave a Comment