QUICK REVIEW

[论文解读] Errors are Useful Prompts: Instruction Guided Task Programming with Verifier-Assisted Iterative Prompting

Marta Skreta, Naruki Yoshikawa|arXiv (Cornell University)|Mar 24, 2023

Topic Modeling被引用 18

一句话总结

CLAIRify 使用验证器辅助的迭代式提示，从自然语言生成语法上有效的领域特定任务计划，优于基线并实现真实机器人执行。

ABSTRACT

Generating low-level robot task plans from high-level natural language instructions remains a challenging problem. Although large language models have shown promising results in generating plans, the accuracy of the output remains unverified. Furthermore, the lack of domain-specific language data poses a limitation on the applicability of these models. In this paper, we propose CLAIRIFY, a novel approach that combines automatic iterative prompting with program verification to ensure programs written in data-scarce domain-specific language are syntactically valid and incorporate environment constraints. Our approach provides effective guidance to the language model on generating structured-like task plans by incorporating any errors as feedback, while the verifier ensures the syntactic accuracy of the generated plans. We demonstrate the effectiveness of CLAIRIFY in planning chemistry experiments by achieving state-of-the-art results. We also show that the generated plans can be executed on a real robot by integrating them with a task and motion planner.

研究动机与目标

解决在领域特定语言（DSLs）中对由大型语言模型生成的任务计划缺乏验证的问题。
通过利用对目标 DSL 的语言描述进行上下文学习，缓解 DSL 数据稀缺。
确保生成计划的语法有效性和环境约束的合规性。
通过与任务与运动规划器（TAMP）的集成，演示生成计划的执行。
在化学数据集上展示相对于先前 XDL 生成方法的优越性能。

提出的方法

在零样本提示中向 LLM 提供目标 DSL 的描述。
迭代生成结构化语言风格的输出，并使用基于规则的验证器进行验证。
将语法/约束错误反馈给 LLM，以便在后续迭代中进行修正。
将环境约束纳入提示和验证器中，以筛选不可行的计划。
使用 TAMP 框架将已验证的 DSL 计划转换为低级动作，以便机器人执行。
在化学描述语言（XDL）及真实机器人实验上展示结果。

Figure 1 : Task plans generated by LLMs may contain syntactical errors in domain-specific languages. By using verifier-assited iterative prompting, CLAIRify can generate a valid program, which can be executed by a robot.

实验结果

研究问题

RQ1自动化的迭代提示能否提升 DSL 任务计划的零样本生成？
RQ2与基线相比，验证器引导的迭代是否能更有效地产生语法正确且可执行的 DSL 程序？
RQ3在与 TAMP 框架集成时，生成的 DSL 计划能否被真实机器人执行？
RQ4将环境约束纳入对计划有效性和可行性有何影响？

主要发现

数据集	方法	生成数量 ↑	专家偏好 ↑
Chem-RnD	SynthReader [16]	92/108	13/108
Chem-RnD	CLAIRify [ours]	105/108	75/108
Chem-EDU	SynthReader [16]	0/40	-
Chem-EDU	CLAIRify [ours]	40/40	-

CLAIRify 在 Chem-RnD 上实现了 105/108 的成功 XDL 计划生成（相较 SynthReader 的 92/108）。
CLAIRify 在 Chem-EDU 上实现了 40/40 的成功 XDL 计划生成（对 SynthReader 为 0/40）。
专家在 Chem-RnD 上偏好 CLAIRify 的计划 75/108 次（对 SynthReader 为 13/108）。
验证器交互每个实验在 Chem-RnD 平均为 2.58 次，在 Chem-EDU 为 1.15 次，表明反馈循环有效。
CLAIRify 计划与 TAMP 框架在实际实验中集成时可以被机器人执行（颜色改变和柠檬水任务）。
错误分析表明，与基线相比，CLAIRify 能减少缺失动作，但会引入其他动作和参数错误，可以通过更丰富的领域知识来缓解。

Figure 2 : System overview : The LLM takes the input (1), structured language definition, and (optionally) resource constraints and generates unverified structured language (2). The output is examined by the verifier, and is passed to LLM with feedback (3). The LLM-generated outputs passes through t

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。