QUICK REVIEW

[论文解读] AdaPlanner: Adaptive Planning from Feedback with Language Models

Haotian Sun, Yuchen Zhuang|arXiv (Cornell University)|May 26, 2023

Topic Modeling被引用 13

一句话总结

AdaPlanner 引入一个显式的闭环规划框架，其中 LLM 充当规划者和细化者，利用 in-plan 与 out-of-plan 的 refinements 结合代码式提示与技能发现，以提高在 ALFWorld 与 MiniWoB++ 的样本效率与适应性。

ABSTRACT

Large language models (LLMs) have recently demonstrated the potential in acting as autonomous agents for sequential decision-making tasks. However, most existing methods either take actions greedily without planning or rely on static plans that are not adaptable to environmental feedback. Consequently, the sequential decision-making performance of LLM agents degenerates with problem complexity and plan horizons increase. We propose a closed-loop approach, AdaPlanner, which allows the LLM agent to refine its self-generated plan adaptively in response to environmental feedback. In AdaPlanner, the LLM agent adaptively refines its plan from feedback with both in-plan and out-of-plan refinement strategies. To mitigate hallucination, we develop a code-style LLM prompt structure that facilitates plan generation across a variety of tasks, environments, and agent capabilities. Furthermore, we propose a skill discovery mechanism that leverages successful plans as few-shot exemplars, enabling the agent to plan and refine with fewer task demonstrations. Our experiments in the ALFWorld and MiniWoB++ environments demonstrate that AdaPlanner outperforms state-of-the-art baselines by 3.73% and 4.11% while utilizing 2x and 600x fewer samples, respectively.

研究动机与目标

解决文本环境中开放式与固定计划的闭环 LLM 代理的局限性。
开发一个显式的闭环框架，使规划和 refinement 都由一个 LLM 完成。
通过代码风格的提示来缓解幻觉，并通过技能发现提升样本效率。
通过 in-plan 询问（ask_LLM）实现快速计划细化，以及通过 out-of-plan 计划修订（refine-then-resume）实现计划重新进行。
在减少演示次数的情况下，在 ALFWorld 与 MiniWoB++ 上展示最先进的性能。

提出的方法

通过 Python 风格的代码提示对任务进行子目标分解，并实现从经过改进的计划中断点驱动的恢复。
显式闭环细化：in-plan（ask_LLM）从观测中提取有用信息并更新未来行动，out-of-plan 在预测失败时替换整个计划。
技能记忆：存储成功的计划并将其用作少样本示例以提高规划效率。
环境交互策略：在 N 个关键时间点进行评估，仅在出现差异时触发细化，减少 API 调用。
证明代码接口相比自然语言提示显著降低 LLM 幻觉。
对开放式、隐式闭环、显式闭环规划系统进行形式化区分，并将 AdaPlanner 分类为显式闭环。

实验结果

研究问题

RQ1在无需重新训练的情况下，基于 LLM 的代理如何利用环境反馈实现实时计划自适应？
RQ2基于代码的提示方法是否能够减少幻觉并提升 LLM 代理的规划可靠性？
RQ3从成功计划中进行技能发现是否能提升长远规划效率和样本效率？
RQ4显式计划细化（in-plan 和 out-of-plan）对任务成功率和样本效率在 ALFWorld 与 MiniWoB++ 上有何影响？
RQ5在不同样本规模下，AdaPlanner 与最先进基线相比如何？

主要发现

方法	选择	清洁	加热	冷却	检查	选两个	全部（134 个任务）
BUTLER	46.00	39.00	74.00	100.00	22.00	24.00	37.00
ReAct (GPT-3)	66.67	41.94	91.03	80.95	55.56	35.29	61.94
ReAct (GPT-3.5)	37.50	64.52	69.57	42.86	38.89	17.65	47.76
Reflexion (GPT-3)	75.00	90.32	91.30	90.48	88.89	94.12	88.06
Reflexion (GPT-3.5)	50.00	41.94	65.22	52.38	66.67	47.06	52.99
AdaPlanner (GPT-3)	100.00	96.77	95.65	100.00	100.00	47.06	91.79
AdaPlanner (GPT-3.5)	77.78	93.55	69.57	93.65	62.96	78.43	80.60

AdaPlanner 在带反馈的情况下实现了最先进的成功率：ALFWorld 为 91.79%，MiniWoB++ 为 91.11%。
相较于某些基线，AdaPlanner 在 ALFWorld 使用的样本量减少了约两倍，在 MiniWoB++ 减少了约 600 倍。
基于代码的提示显著减少幻觉并提升相对于自然语言提示的性能。
技能发现显著提升两个环境中的样本效率和任务成功率。
显式闭环计划细化持续优于隐式或固定计划方法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。