SwarmFoam: An OpenFOAM Multi-Agent System Based on Multiple Types of Large Language Models

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Numerical simulation is one of the mainstream methods in scientific research, typically performed by professional engineers. With the advancement of multi-agent technology, using collaborating agents to replicate human behavior shows immense potential for intelligent Computational Fluid Dynamics (CFD) simulations. Some muti-agent systems based on Large Language Models have been proposed. However, they exhibit significant limitations when dealing with complex geometries. This paper introduces a new multi-agent simulation framework, SwarmFoam. SwarmFoam integrates functionalities such as Multi-modal perception, Intelligent error correction, and Retrieval-Augmented Generation, aiming to achieve more complex simulations through dual parsing of images and high-level instructions. Experimental results demonstrate that SwarmFoam has good adaptability to simulation inputs from different modalities. The overall pass rate for 25 test cases was 84%, with natural language and multi-modal input cases achieving pass rates of 80% and 86.7%, respectively. The work presented by SwarmFoam will further promote the development of intelligent agent methods for CFD.

💡 Research Summary

SwarmFoam presents a novel multi‑agent framework that automates OpenFOAM‑based computational fluid dynamics (CFD) simulations by integrating large language models (LLMs) with multimodal perception capabilities. The system addresses the limitations of prior CFD‑automation efforts, which relied exclusively on textual prompts and struggled with complex geometries and physical specifications. SwarmFoam introduces an “Observer” agent that simultaneously parses textual descriptions and visual inputs (e.g., design drawings, simulation screenshots) using multimodal LLMs. The extracted information—fluid type, boundary conditions, geometric dimensions, and vertex coordinates—is then split into simulation and post‑processing subtasks.

The workflow comprises six specialized agents:

Observer – receives user requirements, performs image‑text embedding, and publishes parsed simulation data.
Architect – translates the parsed data into a structured case blueprint, selecting domain, solver, and case category, and defines the required file hierarchy.
InputWriter – leverages a Retrieval‑Augmented Generation (RAG) subsystem that indexes six local OpenFOAM knowledge bases (execution files, case structures, command lists, configuration files, solver descriptions, and solver help). By retrieving the most relevant text chunks and injecting them into prompts, InputWriter generates accurate configuration files such as blockMeshDict, controlDict, fvSolution, and fvSchemes.
Runner – executes the generated OpenFOAM commands (mesh generation, decomposition, solver run), captures logs, and forwards any error messages.
Reviewer – analyses error logs, identifies the root‑cause file using a “first‑error‑priority” strategy, queries the RAG store for similar past failures, and provides corrective feedback to InputWriter. This approach reduces token consumption by roughly 30 % and minimizes unnecessary re‑runs.
ParaMaster – automatically creates ParaView scripts for visualizing results, and can also interpret output images through the multimodal LLM, delivering quantitative insights from visual data.

Two multimodal perception strategies were explored. The first performs a pre‑parsing step where the image is converted to textual descriptors before being handed to InputWriter; the second feeds raw image and text embeddings directly to the multimodal LLM for simultaneous mesh‑dictionary generation. Ablation studies demonstrated a clear advantage for the pre‑parsing approach, leading to its adoption in the final system.

Experimental evaluation involved 25 diverse CFD test cases covering 2‑D and 3‑D domains, laminar and turbulent flows, and a variety of boundary conditions. Cases were divided into pure‑text input and combined text‑image (multimodal) input groups. SwarmFoam achieved an overall success rate of 84 %, with 80 % for text‑only and 86.7 % for multimodal inputs. Compared to earlier systems such as MetaOpenFOAM‑1/‑2 and OpenFOAM‑GPT, which reported success rates between 60 % and 70 %, SwarmFoam shows a substantial improvement. Error analysis revealed that most failures originated from file‑path mismatches, format inconsistencies, or inaccurate physical parameter specifications; the first‑error‑priority mechanism effectively isolated these root causes.

Limitations identified include sensitivity of image parsing to resolution and quality, incomplete reconstruction of highly intricate 3‑D geometries, and the need for continual updates of the RAG knowledge base to cover newer OpenFOAM releases and specialized solvers. Future work will focus on integrating high‑resolution 3‑D vision models, automating knowledge‑base refresh pipelines, and adding physics‑consistency validation modules to further enhance reliability.

In summary, SwarmFoam demonstrates that a coordinated suite of specialized agents, powered by both text‑only and multimodal LLMs, can autonomously drive complex CFD workflows from high‑level user intent to validated simulation results. By bridging the gap between natural language, visual design data, and domain‑specific engineering knowledge, SwarmFoam paves the way for broader accessibility of advanced fluid‑dynamics analysis to users without deep CFD expertise.

SwarmFoam: An OpenFOAM Multi-Agent System Based on Multiple Types of Large Language Models

💡 Research Summary

Comments & Academic Discussion

Leave a Comment