In-Context System Identification for Nonlinear Dynamics Using Large Language Models

In-Context System Identification for Nonlinear Dynamics Using Large Language Models
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Sparse Identification of Nonlinear Dynamics (SINDy) is a powerful method for discovering parsimonious governing equations from data, but it often requires expert tuning of candidate libraries. We propose an LLM-aided SINDy pipeline that iteratively refines candidate equations using a large language model (LLM) in the loop through in-context learning. The pipeline begins with a baseline SINDy model fit using an adaptive library and then enters a LLM-guided refinement cycle. At each iteration, the current best equations, error metrics, and domain-specific constraints are summarized in a prompt to the LLM, which suggests new equation structures. These candidate equations are parsed against a defined symbolic form and evaluated on training and test data. The pipeline uses simulation-based error as a primary metric, but also assesses structural similarity to ground truth, including matching functional forms, key terms, couplings, qualitative behavior. An iterative stopping criterion ends refinement early if test error falls below a threshold (NRMSE < 0.1) or if a maximum of 10 iterations is reached. Finally, the best model is selected, and we evaluate this LLM-aided SINDy on 63 dynamical system datasets (ODEBench) and march leuba model for boiling nuclear reactor. The results are compared against classical SINDy and show the LLM-loop consistently improves symbolic recovery with higher equation similarity to ground truth and lower test RMSE than baseline SINDy for cases with complex dynamics. This work demonstrates that an LLM can effectively guide SINDy’s search through equation space, integrating data-driven error feedback with domain-inspired symbolic reasoning to discover governing equations that are not only accurate but also structurally interpretable.


💡 Research Summary

The paper introduces a novel closed‑loop pipeline that augments the Sparse Identification of Nonlinear Dynamics (SINDy) framework with a large language model (LLM) to automatically discover governing equations for complex dynamical systems. Classical SINDy relies on a hand‑crafted library of candidate functions; if the true dynamics lie outside this library, recovery fails. The authors address this limitation by treating the LLM as a constrained hypothesis generator that proposes new symbolic structures based on error feedback and domain knowledge.

The workflow proceeds as follows: (1) From a multivariate trajectory, simple statistical descriptors (range, variance, oscillatory or saturating behavior, approximate period) are extracted for each state. These descriptors, together with variable names, units, known parameters, and the current best and baseline equations, are embedded in a structured prompt. The prompt also encodes soft priors over admissible function families (e.g., favoring polynomials, trigonometric, exponential, logarithmic forms) and a short memory of recently rejected candidates. (2) The LLM returns multiple candidate equation templates in a strictly parseable format, obeying a hard grammar that guarantees linearity in unknown coefficients and limits operators to the allowed set. (3) An automated validation pipeline first checks syntactic compliance, then enforces sparsity limits and redundancy filters. Valid candidates are fitted on the training segment using sparse regression to obtain coefficients. (4) Each fitted model is forward‑simulated on a held‑out test interval; models that diverge, time‑out, or produce large numerical errors receive a heavy penalty. (5) Candidates are scored with a multi‑objective metric J = α·NRMSE + β·complexity + γ·structural‑penalty, balancing predictive accuracy, parsimony, and adherence to the structural priors. The best candidate becomes the new “current model.”

The loop repeats up to ten iterations or until the maximum test NRMSE falls below 0.1, whichever occurs first. At each iteration the algorithm focuses on the state with the largest error (“error focus”) and, if progress stalls, increases the diversity of LLM sampling to encourage exploration. After termination, the overall best model observed across all iterations is returned, with a safeguard that prevents selecting an LLM‑refined model that is substantially worse than the baseline when the baseline is deemed reliable.

Experimental evaluation uses the ODEBench suite (63 diverse ODE systems ranging from 1‑ to 4‑dimensional, including chaotic attractors, predator‑prey, oscillators, and reaction kinetics) and a reduced‑order March‑Leuba boiling water reactor model. For each system a single trajectory is used for training and a separate trajectory for testing. The baseline is a conventional SINDy run with a comprehensive fixed library (polynomials up to degree 3, sin, cos, exp, log, 1/x, etc.). Both methods share the same initial library; the LLM‑augmented approach expands the library adaptively based on context and performance.

Results show that the LLM‑aided pipeline consistently reduces test NRMSE by roughly 30 % compared with the baseline and improves structural similarity (matching functional forms, couplings, and qualitative behavior) by over 20 %. The gains are most pronounced for systems whose true dynamics involve non‑standard combinations such as x·exp(y), 1/(1+x²)·sin(z), or other rational‑exponential terms that are absent from the fixed library. In the March‑Leuba reactor case, the LLM is steered toward polynomial and bilinear terms that reflect known neutron‑kinetics feedback, successfully recovering the canonical coupling structure.

Key contributions include: (i) leveraging LLMs as a source of expert‑level intuition within a data‑driven loop, (ii) maintaining a fully automated, transparent validation‑fit‑simulate pipeline that preserves the interpretability of symbolic regression, (iii) introducing a multi‑objective scoring function that explicitly trades off accuracy against complexity and domain‑specific priors. The approach therefore bridges the gap between purely data‑driven sparse regression and end‑to‑end black‑box neural models.

Future work suggested by the authors involves testing robustness to noisy measurements, automating prompt engineering via meta‑learning, integrating ensemble LLM proposals, and coupling the method with uncertainty quantification techniques to provide confidence bounds on the discovered equations. Overall, the paper demonstrates that large language models can meaningfully guide symbolic equation discovery, extending SINDy’s applicability to systems with previously inaccessible functional forms while retaining the transparency and analytical tractability prized by the scientific community.


Comments & Academic Discussion

Loading comments...

Leave a Comment