FSW-GNN: A Bi-Lipschitz WL-Equivalent Graph Neural Network
Famously, the ability of Message Passing Neural Networks (MPNN) to distinguish between graphs is limited to graphs separable by the Weisfeiler-Lemann (WL) graph isomorphism test, and the strongest MPNNs, in terms of separation power, are WL-equivalent. However, it was demonstrated that the quality of separation provided by standard WL-equivalent MPNN can be very low, resulting in WL-separable graphs being mapped to very similar, hardly distinguishable outputs. This phenomenon can be explained by the recent observation that standard MPNNs are not lower-Lipschitz. This paper addresses this issue by introducing FSW-GNN, the first MPNN that is fully bi-Lipschitz with respect to standard WL-equivalent graph metrics. Empirically, we show that our MPNN is competitive with standard MPNNs for several graph learning tasks and is far more accurate in long-range tasks, due to its ability to avoid oversmoothing and oversquashing. Our code is available at https://github.com/yonatansverdlov/Over-squashing.
💡 Research Summary
The paper addresses a fundamental limitation of Message Passing Neural Networks (MPNNs): while the most expressive MPNNs are known to be WL‑equivalent—i.e., they can separate any pair of graphs that the Weisfeiler‑Leman (WL) test can separate—in practice their separation power can be extremely weak. Standard WL‑equivalent MPNNs often map WL‑separable graphs to almost identical embeddings, a phenomenon that recent work attributes to the lack of a lower‑Lipschitz guarantee. This deficiency leads to oversmoothing (node features become indistinguishable after many layers) and oversquashing (information from distant nodes fails to influence a node’s representation).
To remedy this, the authors propose FSW‑GNN (Fourier‑Sliced‑Wasserstein Graph Neural Network), the first MPNN that is fully bi‑Lipschitz with respect to two WL‑equivalent graph metrics: (a) a generalized Doubly‑Stochastic (DS) metric and (b) the Tree Mover’s Distance (TMD). Both metrics quantify how far two graphs are from being WL‑equivalent; the DS metric is based on optimal doubly‑stochastic alignment of adjacency matrices and node features, while TMD measures optimal transport distances between computation trees generated by the WL procedure. The authors extend the DS metric to handle graphs with varying numbers of vertices and continuous node features, proving it remains a WL‑metric.
The core technical contribution lies in the message aggregation step. Instead of conventional sum, mean, or max pooling, FSW‑GNN uses a Fourier‑Sliced‑Wasserstein (FSW) embedding to aggregate a node’s neighborhood multiset. The FSW embedding projects each neighbor vector onto several random directions, sorts the resulting scalars, and then combines them with learned frequencies to produce a fixed‑dimensional vector. Prior work has shown that the FSW embedding is bi‑Lipschitz on multisets; the authors leverage this property to prove that the entire GNN—message aggregation, node update, and graph readout—is bi‑Lipschitz with respect to both DS and TMD distances. Consequently, there exist constants (0<c\le C<\infty) such that for any two graphs (G_1,G_2), \
Comments & Academic Discussion
Loading comments...
Leave a Comment