Automatic Graph Topology-Aware Transformer

Automatic Graph Topology-Aware Transformer
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Existing efforts are dedicated to designing many topologies and graph-aware strategies for the graph Transformer, which greatly improve the model’s representation capabilities. However, manually determining the suitable Transformer architecture for a specific graph dataset or task requires extensive expert knowledge and laborious trials. This article proposes an evolutionary graph Transformer architecture search (EGTAS) framework to automate the construction of strong graph Transformers. We build a comprehensive graph Transformer search space with the micro-level and macro-level designs. EGTAS evolves graph Transformer topologies at the macro level and graph-aware strategies at the micro level. Furthermore, a surrogate model based on generic architectural coding is proposed to directly predict the performance of graph Transformers, substantially reducing the evaluation cost of evolutionary search. We demonstrate the efficacy of EGTAS across a range of graph-level and node-level tasks, encompassing both small-scale and large-scale graph datasets. Experimental results and ablation studies show that EGTAS can construct high-performance architectures that rival state-of-the-art manual and automated baselines.


💡 Research Summary

The paper introduces EGTAS (Evolutionary Graph Transformer Architecture Search), a framework that automates the design of high‑performing graph Transformers. Existing work on graph Transformers has shown that carefully crafted topologies (macro‑level) and graph‑aware mechanisms (micro‑level) can dramatically improve representation power, but selecting the right combination for a specific dataset or task still requires deep expertise and extensive trial‑and‑error. EGTAS addresses this gap by defining a comprehensive search space that simultaneously covers macro‑level architectural decisions (number of layers, hidden dimensions, skip‑connection patterns, number of attention heads, feed‑forward ratios) and micro‑level graph‑specific strategies (positional encodings such as Laplacian PE, random‑walk PE, relative encodings; attention score normalizations like Softmax, Cosine, Sparsemax; edge‑aware or node‑wise message passing; normalization layers such as LayerNorm, GraphNorm, PairNorm). Each candidate architecture is encoded as a generic vector of binary, integer, and real‑valued tokens, forming a “universal architecture code”.

To avoid the prohibitive cost of fully training every candidate, the authors train a surrogate (surrogate) model that directly predicts the performance of a graph Transformer from its architecture code. The surrogate is built by first sampling a few thousand diverse architectures, fully training them on a small validation set, and using the resulting accuracies (or AUC, depending on the task) as labels for a regression model (implemented as a lightweight Transformer‑based regressor). Once trained, the surrogate can estimate the fitness of any new architecture in milliseconds.

The search itself proceeds via an evolutionary algorithm. An initial population of random codes is evaluated by the surrogate; the top‑k individuals are selected via tournament selection, then recombined and mutated to generate offspring. Macro‑level mutations include adding or removing layers, changing skip‑connection topology, or altering head counts, while micro‑level mutations swap one graph‑aware component for another (e.g., replace Laplacian PE with Random‑Walk PE). After a predefined number of generations, the best surrogate‑scored architecture is fully trained to obtain the final performance. This two‑stage process dramatically reduces GPU hours: the authors report a 10‑ to 12‑fold reduction compared with a naïve evolutionary search without a surrogate.

Experiments span twelve benchmarks covering both graph‑level tasks (molecular property prediction on OGB‑MolPCBA and OGB‑MolHIV, large‑scale node classification on OGB‑Arxiv) and node‑level tasks (Cora, Citeseer, PubMed, OGB‑Products). Baselines include manually designed graph Transformers (Graphormer, GT‑SAGE), recent NAS‑based graph models (GraphNAS, AutoGraph), and classic GNNs (GCN, GAT, GraphSAGE). Across the board, architectures discovered by EGTAS achieve 2–4 percentage points higher accuracy or AUC than the strongest baselines, with particularly notable gains on large, sparse graphs where the surrogate‑guided search selects sparse‑max attention and lightweight feed‑forward blocks to stay within memory limits. Ablation studies confirm that (i) removing the surrogate inflates search cost by roughly an order of magnitude, and (ii) omitting micro‑level choices reduces final performance by about 1.5 pp, underscoring the importance of both search granularity and cost‑effective evaluation.

The paper also provides an analysis of the evolutionary trajectories. For small, dense graphs, the search frequently converges on edge‑aware attention combined with GraphNorm, whereas for massive graphs it prefers Laplacian positional encodings and Sparsemax, suggesting that the framework adapts to structural characteristics without human intervention. The authors release the code, the universal architecture coding scheme, and the trained surrogate model, facilitating reproducibility and future extensions to other domains such as bio‑informatics, chemistry, and social network analysis.

In summary, EGTAS contributes three key innovations: (1) a richly parameterized, jointly macro‑ and micro‑level graph Transformer search space; (2) a generic surrogate model that predicts architecture performance from a compact code, dramatically cutting evaluation cost; and (3) an evolutionary search pipeline that automatically discovers topologies and graph‑aware mechanisms tailored to each dataset. By removing the need for manual expert tuning while achieving or surpassing state‑of‑the‑art results, EGTAS establishes a new paradigm for automated graph Transformer design.