LLM-Driven 3D Scene Generation of Agricultural Simulation Environments
Procedural generation techniques in 3D rendering engines have revolutionized the creation of complex environments, reducing reliance on manual design. Recent approaches using Large Language Models (LLMs) for 3D scene generation show promise but often lack domain-specific reasoning, verification mechanisms, and modular design. These limitations lead to reduced control and poor scalability. This paper investigates the use of LLMs to generate agricultural synthetic simulation environments from natural language prompts, specifically to address the limitations of lacking domain-specific reasoning, verification mechanisms, and modular design. A modular multi-LLM pipeline was developed, integrating 3D asset retrieval, domain knowledge injection, and code generation for the Unreal rendering engine using its API. This results in a 3D environment with realistic planting layouts and environmental context, all based on the input prompt and the domain knowledge. To enhance accuracy and scalability, the system employs a hybrid strategy combining LLM optimization techniques such as few-shot prompting, Retrieval-Augmented Generation (RAG), finetuning, and validation. Unlike monolithic models, the modular architecture enables structured data handling, intermediate verification, and flexible expansion. The system was evaluated using structured prompts and semantic accuracy metrics. A user study assessed realism and familiarity against real-world images, while an expert comparison demonstrated significant time savings over manual scene design. The results confirm the effectiveness of multi-LLM pipelines in automating domain-specific 3D scene generation with improved reliability and precision. Future work will explore expanding the asset hierarchy, incorporating real-time generation, and adapting the pipeline to other simulation domains beyond agriculture.
💡 Research Summary
**
The paper presents a modular, multi‑large‑language‑model (LLM) pipeline that automatically generates realistic agricultural simulation environments in Unreal Engine from natural‑language prompts. Recognizing that existing procedural content generation (PCG) tools lack domain‑specific agricultural logic and that single‑LLM approaches suffer from hallucinations and limited scalability, the authors design a three‑stage architecture: (1) Asset Retrieval LLM, (2) Domain Knowledge LLM (implemented with Retrieval‑Augmented Generation, RAG), and (3) Code Generation LLM.
In the first stage, a GPT‑4‑based sub‑query decomposition parses the user prompt into discrete fields (crop type, variety, growth stage, season, health state). These fields are normalized to a predefined asset hierarchy containing 672 possible path combinations. Semantic embeddings generated with OpenAI’s text‑embedding‑3‑small model are indexed in FAISS; a similarity search retrieves the most appropriate asset paths, and a final GPT‑4 validation step ensures internal consistency (e.g., matching season across all fields).
The second stage enriches each retrieved asset with structured agricultural metadata. The authors build a custom RAG pipeline where each metadata entry is a JSON object describing crop spacing, height, disease susceptibility, irrigation needs, and rendering parameters. The same embedding model is used to encode a descriptor string derived from the asset path (e.g., “healthy young Pink Lady apple in fall”), and top‑k semantic matches are filtered for exact hierarchy alignment. If no suitable entry exists, the system logs a fallback. The output is a “recipe” that couples visual assets with agronomic parameters, guaranteeing that the generated scene respects realistic planting rules and seasonal behavior.
The third stage translates the enriched recipe into executable Python code that drives Unreal Engine’s API. The Code Generation LLM has been fine‑tuned on a curated dataset of prompt‑script pairs, and it receives three inputs: the original user prompt, the validated asset paths, and the JSON recipe. The generated script follows a modular structure: scene initialization, asset instantiation, row/column placement based on spacing guidelines, scaling/rotation, and environmental effects (lighting, foliage variation). A post‑generation validator checks for missing assets, Unreal‑specific API errors, and mismatches between metadata and scene parameters, thereby reducing hallucinations and syntax errors.
Evaluation comprises quantitative benchmarks across three prompt categories: detailed single‑field, generic single‑field, and generic multi‑field. The hybrid approach (few‑shot + RAG) achieves 98 % accuracy on detailed prompts, 71 % on generic single‑field prompts, and a precision of 74 % with a recall of 89 % on multi‑field prompts—outperforming pure few‑shot, pure fine‑tuning, and pure RAG baselines. A custom evaluation framework, built on manually curated ground‑truth sets, measures asset retrieval, domain alignment, and code generation quality.
Human‑centered studies further validate the system. In a user study, participants rated the visual realism of generated scenes against real‑world photographs, yielding an average score of 4.2 / 5. In an expert timing experiment, manual construction of comparable scenes required an average of 45 minutes, whereas the proposed pipeline completed the task in under 15 minutes, representing a 68 % reduction in design time.
Key contributions include: (1) a hierarchical asset taxonomy tailored to agricultural simulation; (2) a two‑stage semantic retrieval and RAG mechanism that injects domain knowledge while preserving consistency; (3) a fine‑tuned code generation LLM with automated validation for reliable Unreal Engine scripting; and (4) comprehensive quantitative and qualitative evaluation demonstrating practical benefits. Limitations are acknowledged: the current asset library is a limited subset, and scaling to a full spectrum of crops, varieties, and environmental conditions will require larger indexes and more extensive metadata curation.
Future work aims to (a) expand the asset hierarchy and metadata repository, (b) enable real‑time, on‑the‑fly scene generation, (c) port the pipeline to other 3D engines such as Blender and Unity, and (d) generalize the approach to other simulation domains like urban planning, environmental restoration, and robotics training. The study establishes that a modular multi‑LLM architecture can effectively bridge natural language intent and domain‑specific procedural generation, delivering reliable, scalable, and realistic 3D agricultural simulations.
Comments & Academic Discussion
Loading comments...
Leave a Comment