QUICK REVIEW

[论文解读] SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning

Krishan Rana, Jesse Haviland|arXiv (Cornell University)|Jul 12, 2023

Multimodal Machine Learning Applications被引用 29

一句话总结

SayPlan 将基于大型三维场景图的 LLM 机器人任务规划建立在语义子图搜索之上，结合导航路径规划器和通过场景图仿真器的迭代再规划，以确保在多层楼环境中的可执行计划。

ABSTRACT

Large language models (LLMs) have demonstrated impressive results in developing generalist planning agents for diverse tasks. However, grounding these plans in expansive, multi-floor, and multi-room environments presents a significant challenge for robotics. We introduce SayPlan, a scalable approach to LLM-based, large-scale task planning for robotics using 3D scene graph (3DSG) representations. To ensure the scalability of our approach, we: (1) exploit the hierarchical nature of 3DSGs to allow LLMs to conduct a 'semantic search' for task-relevant subgraphs from a smaller, collapsed representation of the full graph; (2) reduce the planning horizon for the LLM by integrating a classical path planner and (3) introduce an 'iterative replanning' pipeline that refines the initial plan using feedback from a scene graph simulator, correcting infeasible actions and avoiding planning failures. We evaluate our approach on two large-scale environments spanning up to 3 floors and 36 rooms with 140 assets and objects and show that our approach is capable of grounding large-scale, long-horizon task plans from abstract, and natural language instruction for a mobile manipulator robot to execute. We provide real robot video demonstrations on our project page https://sayplan.github.io.

研究动机与目标

解决在大规模、具有多房间/多层楼环境中将长时程 LLM 计划落地的挑战。
利用分层的 3D 场景图（3DSGs）实现对任务相关子图的语义搜索。
通过将路径规划交给经典规划器来降低 LLM 的规划时域。
引入与场景图仿真器配合的迭代再规划循环，以确保计划的可行性。

提出的方法

将环境表示为分层的 3D 场景图（3DSGs）并序列化为 JSON 以供 LLM 输入。
将 3DSG 折叠为高层视图，并通过 LLM 指导的扩展/收缩操作执行语义搜索以识别任务相关的子图 G′。
使用经典路径规划器（例如 Dijkstra）连接高层路径点，减轻 LLM 的导航负担。
通过场景图仿真器的反馈对计划进行迭代改进，以纠正不可行的动作和谓词，直到实现可执行性。

实验结果

研究问题

RQ1LLMs 如何在大规模的 3DSGs 上有效地搜索和推理，以识别与给定指令相关的子图？
RQ2将经典路径规划器与迭代仿真器反馈整合，是否能为多层楼环境中的移动执行臂提出可执行计划？
RQ3语义图折叠对 LLM 令牌效率和规划扩展性的影响是什么？
RQ4在大型环境中 LLM 生成的计划的失效模式有哪些，如何通过迭代再规划来缓解？

主要发现

SayPlan 流水线实现了在最多 3 层、36 间房的环境中可扩展且有根源的任务规划。
对折叠后的 3DSGs 进行语义搜索可将大型环境的令牌量减少多达约 82%，从而实现对 LLM 的解析。
通过场景图仿真器进行迭代再规划，通过纠正不可行的动作并遵守环境谓词，实现接近可执行的计划。
SayPlan 在使用移动操作臂的真实机器人演示中展示出高可执行性和可行性。
与基线相比，SayPlan 通过子图搜索、路径规划和反馈循环相结合，降低幻觉以及导航/操作错误。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。