ArGEnT: Arbitrary Geometry-encoded Transformer for Operator Learning

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Learning solution operators for systems with complex, varying geometries and parametric physical settings is a central challenge in scientific machine learning. In many-query regimes such as design optimization, control and inverse problems, surrogate modeling must generalize across geometries while allowing flexible evaluation at arbitrary spatial locations. In this work, we propose Arbitrary Geometry-encoded Transformer (ArGEnT), a geometry-aware attention-based architecture for operator learning on arbitrary domains. ArGEnT employs Transformer attention mechanisms to encode geometric information directly from point-cloud representations with three variants-self-attention, cross-attention, and hybrid-attention-that incorporates different strategies for incorporating geometric features. By integrating ArGEnT into DeepONet as the trunk network, we develop a surrogate modeling framework capable of learning operator mappings that depend on both geometric and non-geometric inputs without the need to explicitly parametrize geometry as a branch network input. Evaluation on benchmark problems spanning fluid dynamics, solid mechanics and electrochemical systems, we demonstrate significantly improved prediction accuracy and generalization performance compared with the standard DeepONet and other existing geometry-aware saurrogates. In particular, the cross-attention transformer variant enables accurate geometry-conditioned predictions with reduced reliance on signed distance functions. By combining flexible geometry encoding with operator-learning capabilities, ArGEnT provides a scalable surrogate modeling framework for optimization, uncertainty quantification, and data-driven modeling of complex physical systems.

💡 Research Summary

The paper introduces ArGEnT (Arbitrary Geometry‑encoded Transformer), a novel geometry‑aware attention architecture designed to enable operator learning on domains with arbitrary and varying shapes. Traditional surrogate models, including DeepONet, struggle to handle geometry that changes across problem instances because they either require explicit parameterization of the geometry or rely on structured grids that cannot represent irregular domains efficiently. ArGEnT addresses this limitation by embedding a Transformer directly into the trunk of a DeepONet, where the Transformer processes point‑cloud representations of the domain (coordinates, optional signed distance function values, and padding masks).

Three Transformer variants are proposed:

Self‑attention – The point cloud itself (including coordinates and optional SDF) is used to construct the query, key, and value matrices. Rotary Position Embeddings (RoPE) inject relative positional information, allowing the model to capture long‑range geometric relationships. This variant implicitly learns geometry from the distribution of points but is sensitive to mini‑batch sampling because it relies on global context.
Cross‑attention – Geometry is encoded as a fixed set of points (key and value) while the query points, which can be sampled arbitrarily at inference time, form the query matrix. This decouples geometry from the evaluation locations, enabling predictions at any spatial coordinate without needing SDFs. The cross‑attention mechanism is robust to mini‑batch sampling and reduces memory consumption.
Hybrid‑attention – A cross‑attention layer is followed by a self‑attention layer, combining explicit geometry encoding with implicit inter‑point relationships. Experiments show this hybrid design yields the best overall accuracy, especially for problems where geometry strongly influences the solution.

ArGEnT is integrated with DeepONet by using the Transformer‑based trunk to encode geometry and query information, while a separate branch network (often a simple MLP) processes non‑geometric parameters such as material properties, boundary conditions, or source terms. The final output is obtained by taking the inner product of trunk and branch representations, preserving the original DeepONet formulation.

Training is performed with a mini‑batch strategy: at each step, 3,000 query points are randomly sampled from the full training set to compute the mean‑squared‑error loss. Adam optimizer with an initial learning rate of 0.001 and a decay factor of 0.99 every 200 steps is used for 100 k steps on NVIDIA H100 GPUs. The authors note that self‑attention and hybrid‑attention can suffer from degraded performance under this sampling regime because the random mini‑batches may not reflect the global point distribution; increasing batch size mitigates this but at a high computational cost. Cross‑attention is largely unaffected.

The method is evaluated on a suite of benchmark problems spanning fluid dynamics (Navier‑Stokes flow around varying shapes), solid mechanics (elastic deformation of structures with changing geometry), and electrochemical systems (reaction‑diffusion on irregular electrodes). Compared with the standard DeepONet and other geometry‑aware surrogates, ArGEnT achieves 15–30 % lower MSE on average. The cross‑attention variant, in particular, delivers accurate geometry‑conditioned predictions without relying on signed distance functions, while the hybrid‑attention variant provides the best generalization to unseen geometries.

Key insights from the study include:

Direct point‑cloud encoding eliminates the need for handcrafted geometry descriptors or mesh‑based preprocessing.
Transformer attention mechanisms naturally capture long‑range geometric dependencies, which are crucial for PDE‑based operators.
Decoupling geometry (keys/values) from query locations (queries) enables flexible evaluation at arbitrary points, a desirable property for many‑query scenarios such as design optimization and uncertainty quantification.
The choice of attention variant trades off between expressiveness (self‑attention) and robustness to mini‑batch sampling (cross‑attention); hybrid‑attention offers a balanced solution.

In conclusion, ArGEnT provides a scalable, geometry‑flexible surrogate modeling framework that can be combined with existing operator‑learning architectures. By handling both geometric and non‑geometric inputs within a unified attention‑based representation, it opens the door to efficient many‑query analyses, inverse design, and data‑driven modeling of complex physical systems where domain shape varies dramatically.

ArGEnT: Arbitrary Geometry-encoded Transformer for Operator Learning

💡 Research Summary

Comments & Academic Discussion

Leave a Comment