QUICK REVIEW

[论文解读] 3D-GPT: Procedural 3D Modeling with Large Language Models

Chunyi Sun, Junlin Han|arXiv (Cornell University)|Oct 19, 2023

Human Motion and Animation被引用 12

一句话总结

3D-GPT 使用三智能体LLM框架将自然语言指令翻译为驱动 Blender 程序化生成的 Python 脚本，使在不进行模型训练的情况下实现指令驱动的 3D 内容与编辑。

ABSTRACT

In the pursuit of efficient automated content creation, procedural generation, leveraging modifiable parameters and rule-based systems, emerges as a promising approach. Nonetheless, it could be a demanding endeavor, given its intricate nature necessitating a deep understanding of rules, algorithms, and parameters. To reduce workload, we introduce 3D-GPT, a framework utilizing large language models~(LLMs) for instruction-driven 3D modeling. 3D-GPT positions LLMs as proficient problem solvers, dissecting the procedural 3D modeling tasks into accessible segments and appointing the apt agent for each task. 3D-GPT integrates three core agents: the task dispatch agent, the conceptualization agent, and the modeling agent. They collaboratively achieve two objectives. First, it enhances concise initial scene descriptions, evolving them into detailed forms while dynamically adapting the text based on subsequent instructions. Second, it integrates procedural generation, extracting parameter values from enriched text to effortlessly interface with 3D software for asset creation. Our empirical investigations confirm that 3D-GPT not only interprets and executes instructions, delivering reliable results but also collaborates effectively with human designers. Furthermore, it seamlessly integrates with Blender, unlocking expanded manipulation possibilities. Our work highlights the potential of LLMs in 3D modeling, offering a basic framework for future advancements in scene generation and animation.

研究动机与目标

展示LLMs如何使用多智能体系统将3D建模任务分解为可管理的子任务。
通过程序化生成和参数提取实现指令驱动的 3D 内容合成。
展示LLMs能够生成用于与Blender 交互的 Python 脚本以进行资产创建与编辑。
评估LLMs与人类设计师在产生连贯3D场景方面的协作。

提出的方法

引入一个三智能体系统：任务分发智能体、概念化智能体和建模智能体，用于处理规划、描述增强与参数推断。
准备程序化生成库（Infinigen），附带函数文档、可读代码、所需信息和用例，便于LLMs 调用 Blender API。
让任务分发智能体为每条指令选择必要的函数；概念化智能体用所需参数丰富描述；建模智能体推断参数并生成调用 Blender 函数的 Python 代码。
使记忆先前修改成为可能，以支持后续指令编辑和场景的一致演变。
可选地生成 Python 代码而非直接的 3D 输出，以利用真实世界3D软件的灵活性。
直接在 Blender 中呈现结果，以确保真实网格和光线追踪视觉效果。

实验结果

研究问题

RQ1多智能体 LLM 系统是否能够将自然语言指令解读为驱动 Blender 的程序化 3D 生成？
RQ2概念化与任务分发模块是否能提高对齐度、参数多样性和3D生成任务的成功率？
RQ3从丰富文本中提取功能参数以通过 Python 脚本控制 Blender 是否可行？
RQ4系统在支持后续指令和记忆前一次编辑方面的能力如何？
RQ5在改进基于LLM的3D建模方面有哪些局限性和潜在方向？

主要发现

实验	CLIP 分数	失败率	参数多样性
w/o TDA	22.79	3.6%	6.32
Ours (with TDA)	29.16	0.8%	7.34
w/o CA	21.51	3.6%	6.32
Ours (with CA)	30.30	0.8%	7.34

3D-GPT 框架可以生成与初始及后续文本指令对齐的 Blender 控制的 3D 内容。
在消融研究中，移除任务分发智能体会降低 CLIP 对齐并增加失败，证实其在管理指令流中的作用。
移除概念化智能体会降低 CLIP 分数和参数多样性并提高失败率，强调其在参数推断和描述细化中的重要性。
该系统支持大场景生成和对细粒度对象的控制（如花卉），且在形状、颜色和外观方面具备准确的参数推断。
带记忆的后续指令编辑提高了跨编辑的一致性，并避免了为可控编辑而需要的额外网络。
工作流可以直接在 Blender 中呈现结果，实现真实光线追踪和3D 一致性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。