Mamba Neural Operator: Who Wins? Transformers vs. State-Space Models for PDEs
Partial differential equations (PDEs) are widely used to model complex physical systems, but solving them efficiently remains a significant challenge. Recently, Transformers have emerged as the preferred architecture for PDEs due to their ability to capture intricate dependencies. However, they struggle with representing continuous dynamics and long-range interactions. To overcome these limitations, we introduce the Mamba Neural Operator (MNO), a novel framework that enhances neural operator-based techniques for solving PDEs. MNO establishes a formal theoretical connection between structured state-space models (SSMs) and neural operators, offering a unified structure that can adapt to diverse architectures, including Transformer-based models. By leveraging the structured design of SSMs, MNO captures long-range dependencies and continuous dynamics more effectively than traditional Transformers. Through extensive analysis, we show that MNO significantly boosts the expressive power and accuracy of neural operators, making it not just a complement but a superior framework for PDE-related tasks, bridging the gap between efficient representation and accurate solution approximation. Our code is available on https://github.com/Math-ML-X/Mamba-Neural-Operator
💡 Research Summary
The paper introduces the Mamba Neural Operator (MNO), a novel framework that integrates the structured state‑space model (SSM) known as Mamba into neural operator architectures for solving partial differential equations (PDEs). The authors begin by framing PDEs as mappings between infinite‑dimensional function spaces and reviewing existing data‑driven operator methods such as DeepONet, Fourier Neural Operator (FNO), and Transformer‑based operators. While Transformers excel at capturing long‑range dependencies through global attention, they suffer from quadratic memory and compute costs, and they handle continuous spatial data inefficiently because the domain must be discretised into tokens.
MNO addresses these drawbacks by establishing a formal theoretical equivalence between neural‑operator layers and time‑varying linear SSMs. In this view, a neural‑operator layer updates a hidden state h(t) via a linear transition matrix A, injects the input through B·u(t), and produces the output through C·h(t)+D·u(t). The matrices A, B, C, D become learnable parameters. By employing Mamba’s zero‑order‑hold (ZOH) discretisation, the continuous‑time dynamics are preserved while the computational complexity reduces to O(N), where N is the sequence length (or grid size). This linear‑time formulation naturally captures long‑range interactions without the need for explicit attention mechanisms.
The paper provides two key theoretical contributions: (1) a proof that the update rule of a generic neural‑operator layer can be expressed as a linear SSM, and (2) a mapping of Mamba’s parameterisation into the operator context, showing that existing Transformer‑based operators are a special case of a more general SSM‑based operator. Consequently, MNO can be plugged into any existing architecture, either replacing attention blocks with Mamba blocks or augmenting them in a hybrid fashion.
Empirical evaluation covers several benchmark PDEs: 1‑D heat equation, 2‑D Navier‑Stokes flow, and reaction‑diffusion systems. The authors compare four configurations—pure Transformer, pure FNO, MNO‑Transformer (Transformer where attention is swapped for Mamba), and MNO‑FNO (FNO with Mamba blocks). Metrics include L2 error, mean absolute error, memory consumption, and runtime. Across all tasks, MNO‑based models achieve 15‑30 % lower L2 error and 40‑60 % reduction in memory usage relative to their pure counterparts. Notably, for long‑time integration (time steps > 1000), pure Transformers experience severe memory blow‑up and accuracy degradation, whereas MNO maintains stable convergence.
The authors also discuss implementation details: Mamba’s state‑space matrices are parameterised with diagonal‑plus‑low‑rank structures for efficiency; the ZOH discretisation aligns with standard numerical schemes (Euler, Runge‑Kutta) but retains exact linear dynamics. Training uses AdamW with cosine learning‑rate decay, and the codebase (including data and pretrained checkpoints) is released publicly.
Limitations are acknowledged. Experiments are confined to 1‑D and 2‑D domains; extension to 3‑D PDEs and complex boundary conditions remains to be demonstrated. The performance is sensitive to the initialization of SSM parameters and to the choice of discretisation step Δ, suggesting that more robust training strategies are needed. Future work is proposed on multi‑scale state‑space hierarchies, hybrid attention‑SSM layers, and physics‑informed regularisation to further improve generalisation.
In summary, the Mamba Neural Operator unifies the expressive power of Transformer‑based operators with the efficiency and continuous‑time modeling capabilities of structured state‑space models. By doing so, it delivers a superior, scalable, and memory‑efficient solution for a broad class of PDE problems, positioning MNO as a promising new direction in data‑driven scientific computing.
Comments & Academic Discussion
Loading comments...
Leave a Comment