PDEs are central to scientific and engineering modeling, yet designing accurate numerical solvers typically requires substantial mathematical expertise and manual tuning. Recent neural network-based approaches improve flexibility but often demand high computational cost and suffer from limited interpretability. We introduce \texttt{AutoNumerics}, a multi-agent framework that autonomously designs, implements, debugs, and verifies numerical solvers for general PDEs directly from natural language descriptions. Unlike black-box neural solvers, our framework generates transparent solvers grounded in classical numerical analysis. We introduce a coarse-to-fine execution strategy and a residual-based self-verification mechanism. Experiments on 24 canonical and real-world PDE problems demonstrate that \texttt{AutoNumerics} achieves competitive or superior accuracy compared to existing neural and LLM-based baselines, and correctly selects numerical schemes based on PDE structural properties, suggesting its viability as an accessible paradigm for automated PDE solving.
Partial differential equations (PDEs) form the mathematical foundation of modern physics, engineering, and many areas of scientific computing. Accurately solving PDEs is therefore a central task in computational research. Traditionally, constructing a reliable numerical solver for a new PDE requires substantial expertise in numerical analysis, including the selection of appropriate discretization schemes (e.g., finite difference, finite element, or spectral methods) and verification of stability and convergence conditions such as the Courant-Friedrichs-Lewy (CFL) constraint (LeVeque, 2007). These classical approaches provide strong mathematical guarantees and interpretability, but their expert-driven design can limit accessibility and slow solver development for newly arising PDE models.
Neural network-based approaches such as physics-informed neural networks (PINNs) (Raissi et al., 2019) and operator-learning frameworks (Lu et al., 2019;Li et al., 2020) reduce reliance on handcrafted discretizations but introduce new concerns around computational cost and interpretability. Large language models (LLMs) have recently demonstrated strong capabilities in scientific code generation (Zhang et al., 2024), and existing LLM-assisted PDE efforts include neural solver design (He et al., 2025;Jiang & Karniadakis, 2025), tool-oriented systems that invoke libraries such as FEniCS (Liu et al., 2025;Wu et al., 2025), and code-generation paradigms (Li et al., 2025). However, these approaches either produce black-box networks, are constrained by fixed library APIs, or lack mechanisms for autonomous debugging and correctness verification. We propose that LLMs can serve as numerical architects that directly generate transparent solver code from first principles, preserving interpretability while automating solver construction.
Translating this vision into a reliable system poses several technical challenges. First, LLMgenerated code often contains syntax errors or logical flaws, and debugging these errors on high-resolution grids is both time-consuming and computationally wasteful. Second, verifying solver correctness becomes difficult for PDEs lacking analytical solutions. Third, large-scale temporal simulations may lead to memory exhaustion. We address these challenges with three corresponding solutions. A coarse-to-fine execution strategy first debugs logic errors on low-resolution grids before running on high-resolution grids. A residual-based self-verification mechanism evaluates solver quality for problems without analytical solutions by computing PDE residual norms. A history decimation mechanism enables large-scale temporal simulations through sparse storage of intermediate states.
Building on these design principles, we propose AutoNumerics, a multi-agent autonomous framework. The system receives natural language problem descriptions, proposes multiple candidate numerical strategies through a planning agent, implements executable solvers, and systematically evaluates their correctness and performance. We evaluate the framework on 24 representative PDE problems spanning canonical benchmarks and real-world applications. Results demonstrate consistent numerical scheme selection, stable solver synthesis, and reliable accuracy across diverse PDE classes.
Position relative to prior work. Existing LLM-assisted PDE efforts include neural solver design (He et al., 2025;Jiang & Karniadakis, 2025), tool-oriented systems that invoke libraries such as FEniCS (Liu et al., 2025;Wu et al., 2025), and code-generation paradigms (Li et al., 2025). AutoNumerics differs from all three. It generates interpretable classical numerical schemes (not black-box networks), automatically detects and filters ill-designed or non-expert numerical plan configurations, derives discretizations from first principles (not fixed library APIs), and includes a coarse-to-fine execution strategy with residual-based self-verification for autonomous correctness assessment. A detailed review of related work is provided in Appendix A.
Contributions. The primary contributions of this work are:
• A multi-agent framework (AutoNumerics) that autonomously constructs transparent numerical PDE solvers from natural language descriptions. • A reasoning module that detects ill-designed or non-expert PDE specifications and proactively filters or revises numerical plans that may lead to instability or invalid solutions. • A coarse-to-fine execution strategy that decouples logic debugging from stability validation. • A residual-based self-verification mechanism for solver evaluation without analytical solutions.
• A benchmark suite of 200 PDEs and systematic evaluation on 24 representative problems, with comparisons to neural network baselines and CodePDE.
AutoNumerics consists of multiple specialized LLM agents coordinated by a central dispatcher.
The system takes a natural language PDE problem description as input and produces executable numerical solver code with
This content is AI-processed based on open access ArXiv data.