Expressive Power of Graph Transformers via Logic
Transformers are the basis of modern large language models, but relatively little is known about their precise expressive power on graphs. We study the expressive power of graph transformers (GTs) by Dwivedi and Bresson (2020) and GPS-networks by Rampásek et al. (2022), both under soft-attention and average hard-attention. Our study covers two scenarios: the theoretical setting with real numbers and the more practical case with floats. With reals, we show that in restriction to vertex properties definable in first-order logic (FO), GPS-networks have the same expressive power as graded modal logic (GML) with the global modality. With floats, GPS-networks turn out to be equally expressive as GML with the counting global modality. The latter result is absolute, not restricting to properties definable in a background logic. We also obtain similar characterizations for GTs in terms of propositional logic with the global modality (for reals) and the counting global modality (for floats).
💡 Research Summary
This paper investigates the expressive power of graph transformers (GTs) and GPS‑networks through the lens of logical formalisms. The authors consider two numerical settings—real numbers, which are common in theoretical analyses, and floating‑point numbers, which reflect practical implementations. For each setting they study both soft‑attention and average hard‑attention mechanisms, and they focus on vertex classification without positional encodings.
The first major result concerns real‑valued models. When restricting attention to vertex properties definable in first‑order logic (FO), GPS‑networks are shown to be exactly as expressive as graded modal logic (GML) equipped with a non‑counting global modality (denoted GML + G). This means that GPS‑networks can express global existence and universal statements (e.g., “there exists a vertex with property P” or “all vertices satisfy Q”) but cannot perform absolute counting such as “at least ten vertices have label p”. The proof introduces a novel bisimulation notion called global‑ratio graded bisimilarity (∼G %) and establishes a van Benthem/Rosen‑style theorem: any FO‑formula invariant under ∼G % is equivalent to a GML + G formula. Consequently, GTs with real inputs are characterized by propositional logic with the same non‑counting global modality (PL + G). Both characterizations hold for soft‑attention and average hard‑attention, assuming sum aggregation in the underlying GNN layers.
The second set of results deals with floating‑point inputs. Here the authors exploit the under‑flow phenomenon of floats (tiny values become zero) to show that GPS‑networks become exactly as expressive as GML with a counting global modality (GML + GC). This logic can state absolute counting properties such as “the graph contains at least k vertices labelled p”. Importantly, this characterization is absolute—it does not rely on a background FO restriction—and it applies to any reasonable aggregation function (sum, max, mean). In contrast to the real case, relative counting (“more vertices with label p than with label q”) is no longer expressible, making the two numeric regimes incomparable. The authors also prove that float‑based GTs correspond to propositional logic with counting global modality (PL + GC).
Methodologically, the paper bridges graph neural network (GNN) theory and logical expressiveness. It shows that GPS‑networks with real numbers are equivalent to GNNs augmented with a non‑counting global readout, while float‑based GPS‑networks match GNNs with a counting global readout. The work builds on prior characterizations of GNNs via GML and extends them to hybrid transformer‑GNN architectures.
Beyond the core theorems, the paper discusses extensions to word‑shaped graphs, graph‑level classification, non‑Boolean tasks, and the impact of adding positional encodings. It emphasizes that the study is purely theoretical; no empirical evaluation is presented.
Overall, the paper provides a clear logical taxonomy of graph transformer models, revealing that the choice between real and floating‑point arithmetic fundamentally changes which global logical operations can be captured. This insight offers a principled guide for designing graph learning architectures that require specific global reasoning capabilities.
Comments & Academic Discussion
Loading comments...
Leave a Comment