[论文解读] Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
大型语言模型可以在未经过训练的情况下为具象任务生成看似合理的高层行动计划,但这些计划通常不可执行;本文提出将计划翻译并纠正以提高 VirtualHome 中的可执行性的方法,在可执行性显著提升的同时在语义正确性上存在一定权衡。
Can world knowledge learned by large language models (LLMs) be used to act in interactive environments? In this paper, we investigate the possibility of grounding high-level tasks, expressed in natural language (e.g. "make breakfast"), to a chosen set of actionable steps (e.g. "open fridge"). While prior work focused on learning from explicit step-by-step examples of how to act, we surprisingly find that if pre-trained LMs are large enough and prompted appropriately, they can effectively decompose high-level tasks into mid-level plans without any further training. However, the plans produced naively by LLMs often cannot map precisely to admissible actions. We propose a procedure that conditions on existing demonstrations and semantically translates the plans to admissible actions. Our evaluation in the recent VirtualHome environment shows that the resulting method substantially improves executability over the LLM baseline. The conducted human evaluation reveals a trade-off between executability and correctness but shows a promising sign towards extracting actionable knowledge from language models. Website at https://huangwl18.github.io/language-planner
研究动机与目标
- 证明预训练的大语言模型能够在无需额外训练的情况下,将高层任务分解为中层计划。
- 评估这些由 LLM 生成的计划在具身家庭环境中的可执行性。
- 开发并评估在推断时将自由形式计划翻译为可被环境接受的行动以及纠正轨迹的方法。
- 量化具象化规划中可执行性与语义正确性之间的权衡。
- 为将来自 LLMs 的可行动知识落地于具身智能体提供指导。
提出的方法
- 用高层任务名称和演示示例对预训练的 LLM 进行查询,以生成行动计划。
- 使用语义嵌入(Translation LM)将自由形式的计划短语翻译为可进入环境的动作。
- 自回归地生成并翻译步骤,以保持可行性并通过轨迹更正纠正执行。
- 动态选择示例任务,以从演示集中对最相似的任务对 LLM 进行提示。
- 通过在 VirtualHome 中的人工评估来评价可执行性和语义正确性;报告基于 LCS 的正确性和可执行性指标。
实验结果
研究问题
- RQ1Can large language models generate meaningful mid-level action plans for high-level tasks without additional training?
- RQ2To what extent are these plans executable in an embodied environment, and how can executability be improved without retraining the model?
- RQ3Does semantic translation of plans into admissible actions improve grounding in embodied agents, and what trade-offs emerge with correctness?
- RQ4How does dynamic demonstration selection influence knowledge extraction for planning?
- RQ5What is the impact of autoregressive trajectory correction on plan validity and grounding?
主要发现
- LLMs can produce highly plausible action plans for high-level tasks without training, sometimes surpassing human-written plans in perceived correctness.
- Naively generated plans are often not executable due to mismatches with admissible actions and ambiguity.
- Translating plan steps into admissible actions via a Translation LM significantly increases executability (from 18% to 79% in their setup).
- Translation improves alignment with environment syntax and increases LCS-based similarity to human plans, but can reduce perceived correctness due to translation errors or incomplete environment support.
- Autoregressive trajectory correction and dynamic example selection further bolster executability and grounding, though there remains a gap compared to human-level execution.
- The approach achieves notable executability gains with no model parameter updates, enabling integration into existing pipelines.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。