Relational Graph Transformer
Relational Deep Learning (RDL) is a promising approach for building state-of-the-art predictive models on multi-table relational data by representing it as a heterogeneous temporal graph. However, commonly used Graph Neural Network models suffer from fundamental limitations in capturing complex structural patterns and long-range dependencies that are inherent in relational data. While Graph Transformers have emerged as powerful alternatives to GNNs on general graphs, applying them to relational entity graphs presents unique challenges: (i) Traditional positional encodings fail to generalize to massive, heterogeneous graphs; (ii) existing architectures cannot model the temporal dynamics and schema constraints of relational data; (iii) existing tokenization schemes lose critical structural information. Here we introduce the Relational Graph Transformer (RelGT), the first graph transformer architecture designed specifically for relational tables. RelGT employs a novel multi-element tokenization strategy that decomposes each node into five components (features, type, hop distance, time, and local structure), enabling efficient encoding of heterogeneity, temporality, and topology without expensive precomputation. Our architecture combines local attention over sampled subgraphs with global attention to learnable centroids, incorporating both local and database-wide representations. Across 21 tasks from the RelBench benchmark, RelGT consistently matches or outperforms GNN baselines by up to 18%, establishing Graph Transformers as a powerful architecture for Relational Deep Learning.
💡 Research Summary
The paper introduces Relational Graph Transformer (RelGT), the first graph‑transformer architecture explicitly designed for relational deep learning (RDL) on multi‑table relational databases. Traditional RDL pipelines convert relational databases into heterogeneous temporal graphs (REGs) and apply graph neural networks (GNNs) such as GraphSAGE or Heterogeneous Graph Transformer (HGT). While effective, GNNs suffer from limited structural expressiveness, poor long‑range information propagation, and difficulty handling temporal dynamics and schema‑defined constraints.
RelGT addresses these shortcomings through three core innovations. First, it proposes a multi‑element tokenization scheme that decomposes each node into five components: (i) raw feature vector, (ii) node type derived from the source table, (iii) hop distance to the training seed node, (iv) a time encoding that captures the timestamp of the entity, and (v) a local‑structure encoding derived from the sampled subgraph (e.g., distribution of neighbor types). Each component is processed by a dedicated encoder (multimodal, type, hop, time, and structural encoders) and projected into a common embedding space; the final token representation is the element‑wise sum of these embeddings. This design eliminates the need for costly pre‑computed positional encodings while preserving heterogeneity, temporality, and schema‑level information.
Second, RelGT combines local and global attention. For every training seed node, a fixed‑size, time‑aware subgraph is sampled. Tokens within this subgraph attend to each other via full self‑attention (local attention), capturing fine‑grained relational patterns. In parallel, a set of learnable global centroid tokens (soft centroids) attend to all local tokens, providing a database‑wide context. The two attention streams are interleaved across transformer layers, enabling efficient long‑range dependency modeling without the O(N²) cost of naïve global attention; computational complexity remains O(N·k) where k is the sampled neighborhood size.
Third, the model incorporates temporal‑aware neighbor sampling and a dedicated time encoder to prevent data leakage and to emphasize recent interactions. The time encoder learns relative ordering rather than absolute timestamps, allowing the model to adapt to varying temporal granularities across tables.
The authors evaluate RelGT on the RelBench benchmark, which comprises 21 real‑world tasks spanning recommendation, fraud detection, churn prediction, inventory management, and more. Baselines include HGT, GraphSAINT‑based GraphSAGE, RelGNN, ContextGNN, and hybrid tabular‑GNN approaches. RelGT consistently matches or exceeds these baselines, achieving up to an 18 % absolute improvement on the hardest tasks and an average gain of 9.3 % across the suite. Notably, RelGT attains comparable or lower GPU memory consumption than HGT despite not using pre‑computed Laplacian eigenvectors for positional encoding. Training time is reduced by roughly 15–20 % thanks to the efficient local‑global attention scheme.
Ablation studies reveal that each token component contributes meaningfully: removing any of the five elements degrades performance by 2–4 % on average, and eliminating the global centroids reduces long‑range modeling capability, causing up to a 12 % drop. The paper also discusses limitations, such as sensitivity to the chosen hop distance and subgraph size, and the need to tune the number of global centroids for very large datasets. Future work is suggested in dynamic centroid updates, meta‑learning for automatic token‑parameter tuning, and distributed training strategies for massive enterprise graphs.
In summary, RelGT offers a principled, scalable solution for learning from relational databases by jointly encoding heterogeneity, temporality, and schema‑driven structure without expensive pre‑computation. Its strong empirical results demonstrate that graph transformers can surpass state‑of‑the‑art GNNs in relational deep learning, opening new avenues for end‑to‑end modeling of complex enterprise data.
Comments & Academic Discussion
Loading comments...
Leave a Comment