QUICK REVIEW

[论文解读] REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction

Zeyi Liu, Arpit Bahety|arXiv (Cornell University)|Jun 27, 2023

Topic Modeling被引用 18

一句话总结

REFLECT 使用一个分层的多感官机器人摘要来查询大语言模型以获取故障解释和纠正计划，在 RoboFail 上对仿真和现实世界数据进行评估。

ABSTRACT

The ability to detect and analyze failed executions automatically is crucial for an explainable and robust robotic system. Recently, Large Language Models (LLMs) have demonstrated strong reasoning abilities on textual inputs. To leverage the power of LLMs for robot failure explanation, we introduce REFLECT, a framework which queries LLM for failure reasoning based on a hierarchical summary of robot past experiences generated from multisensory observations. The failure explanation can further guide a language-based planner to correct the failure and complete the task. To systematically evaluate the framework, we create the RoboFail dataset with a variety of tasks and failure scenarios. We demonstrate that the LLM-based framework is able to generate informative failure explanations that assist successful correction planning.

研究动机与目标

通过自动反思过去的失败来推动稳健且可解释的机器人技术。
开发一个多感官、分层的机器人经历摘要用于故障推理。
利用大语言模型生成自然语言的故障解释和纠正计划。
创建并使用 RoboFail，一个用于评估的机器人故障演示数据集。

提出的方法

构建一个三个层次的分层机器人摘要（感官输入、事件级、子目标级）基于多感官观测（RGB-D、音频、状态）。
将感官数据转换为任务相关的场景图和音频说明，以获得信息丰富的摘要。
逐步向LLM提问，先检测子目标是否成功，然后利用摘要中的相关历史生成故障解释。
要LLM给出纠正计划；使用基于嵌入的匹配将生成的动作映射到可在环境中执行的动作。

Fig 1: A framework for robot failure explanation and correction. On the left, we show the REFLECT framework that converts multisensory observations (RGB-D, audio, robot states) to a hierarchical summary of robot experiences. The summary is then used to query a Large Language Model (LLM) for failure

实验结果

研究问题

RQ1分层的多感官摘要是否能通过LLMs实现对故障定位和解释的准确性？
RQ2相对于非渐进查询，渐进式故障解释是否提升了定位与解释质量？
RQ3LLM 生成的纠正计划能否在仿真和现实世界的机器人任务中有效修复故障？
RQ4纳入音频模态对解释和定位性能的影响是什么？
RQ5与基于字幕的或无解释的基线相比，REFLECT 在故障处理中的表现如何？

主要发现

方法	Exp（执行）	Loc（执行）	共同规划（执行）	Exp（计划）	Loc（计划）	共同规划（计划）
无渐进式	46.5	62.8	60.5	61.4	70.2	64.9
仅子目标	76.7	74.4	51.2	71.9	73.7	75.4
LLM 摘要	55.8	67.4	65.1	57.9	54.4	66.7
无解释	-	-	41.9	-	-	56.1
REFLECT	88.4	96.0	79.1	84.2	80.7	80.7

相较基线，REFLECT 在仿真中的解释、定位和纠正规划方面获得最高分。
在仿真中，REFLECT 对执行失败的解释约为 88.4%、定位 96.0%、纠正规划成功率 79.1%；对于计划失败，分别为 84.2%、80.7%、80.7%。
在现实世界实验中，REFLECT 在执行失败方面以 68.8% 的解释和 93.8% 的定位优于基线；在计划失败方面为 78.6% 的解释和 78.6% 的定位。
消融结果显示，渐进式故障解释相比非渐进基线提升了性能；音频有助于解释仅靠视觉无法获取的故障。
BLIP2 标注在故障解释上的表现较差，而零样本、与任务相关的摘要捕捉到必要的对象状态和空间关系信息。

Fig 2: Hierarchical robot summary is composed of: a) a sensory-input summary that converts multisensory robot observations (RGB-D, sound, robot states) into task-informed scene graphs and audio summary; b) an event-based summary that generates captions for key event frames; c) a subgoal-based summar

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。