[论文解读] LLM-Driven 3D Scene Generation of Agricultural Simulation Environments
该论文提出一个模块化的多模型大语言模型流水线,通过整合资源检索、通过RAG实现的领域知识,以及代码生成,将自然语言提示转换为Unreal Engine的3D农业场景。
Procedural generation techniques in 3D rendering engines have revolutionized the creation of complex environments, reducing reliance on manual design. Recent approaches using Large Language Models (LLMs) for 3D scene generation show promise but often lack domain-specific reasoning, verification mechanisms, and modular design. These limitations lead to reduced control and poor scalability. This paper investigates the use of LLMs to generate agricultural synthetic simulation environments from natural language prompts, specifically to address the limitations of lacking domain-specific reasoning, verification mechanisms, and modular design. A modular multi-LLM pipeline was developed, integrating 3D asset retrieval, domain knowledge injection, and code generation for the Unreal rendering engine using its API. This results in a 3D environment with realistic planting layouts and environmental context, all based on the input prompt and the domain knowledge. To enhance accuracy and scalability, the system employs a hybrid strategy combining LLM optimization techniques such as few-shot prompting, Retrieval-Augmented Generation (RAG), finetuning, and validation. Unlike monolithic models, the modular architecture enables structured data handling, intermediate verification, and flexible expansion. The system was evaluated using structured prompts and semantic accuracy metrics. A user study assessed realism and familiarity against real-world images, while an expert comparison demonstrated significant time savings over manual scene design. The results confirm the effectiveness of multi-LLM pipelines in automating domain-specific 3D scene generation with improved reliability and precision. Future work will explore expanding the asset hierarchy, incorporating real-time generation, and adapting the pipeline to other simulation domains beyond agriculture.
研究动机与目标
- 促使从自然语言提示自动生成农业仿真环境。
- 开发一个模块化流水线,克服单一LLM在领域推理与验证上的不足。
- 集成资产分层检索、通过检索增强生成的领域知识,以及用于Unreal Engine的Python代码生成。
- 以定性与定量指标评估流水线,并与单一LLM基线进行比较。
提出的方法
- 三阶段模块化流水线:资产检索、通过RAG的领域知识丰富、以及Unreal Engine的代码生成。
- 结构化资产层级涵盖水果和蔬菜的生长阶段、季节和健康状态,以映射提示到资产路径。
- 混合LLM优化(少量示例提示、微调、RAG、验证)以提高准确性并减少幻觉。
- 基于FAISS的语义检索用于资产路径和领域元数据,包含验证步骤以确保与提示的一致性。
- 代码生成LLM输出可执行的Unreal Engine Python脚本,生成后对资产路径与领域知识对齐进行验证。
实验结果
研究问题
- RQ1如何在模块化流水线中组合LLM以从自然语言生成领域特定的3D农业场景?
- RQ2资产检索加领域知识增强是否比单一LLM方法在准确性和一致性方面有提升?
- RQ3多模型系统在时间效率和用户评估方面对性能与真实感有何影响?
- RQ4混合检索方法(子查询归一化+元数据过滤)如何影响领域一致的场景生成?
主要发现
- 模块化的多模型流水线在资产检索和领域元数据对齐方面实现了较高的准确性。
- 带有归一化和严格过滤的混合检索提高了领域知识的精确匹配准确性(Top-1 71% 对比表II中的82%)。
- 代码生成结果表明,对于单字段提示,可生成可执行脚本,资产使用和领域一致性正确;多字段提示在保持这一点的同时存在一些空间布局问题。
- 用户研究显示对提示匹配度和真实感的感知适中,但由于资产与缺失地形元素导致一些可视差错。
- 专家评估显示显著的时间节省,系统生成的场景比手工创建快得多(平均约49秒对比约94秒/场景)。
- 与单一LLM基线相比,模块化方法在模块化性、可扩展性、正确性和灵活性方面表现更好,减少了幻觉和路径格式错误。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。