QUICK REVIEW

[论文解读] Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation

Suraj Nair, Eric Mitchell|arXiv (Cornell University)|Sep 2, 2021

Multimodal Machine Learning Applications被引用 23

一句话总结

本文提出 LOReL，一种从离线的、次优的机器人数据和众包自然语言注释中学习语言条件化机器人行为的方法。通过在配对的语言指令和状态转移上训练语言条件化奖励分类器，LOReL 使视觉模型预测控制在真实世界语言指定的操纵任务中实现了 66% 的平均成功率，优于基于目标图像和模仿学习的方法超过 25%。

ABSTRACT

We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction. In order to accomplish this, humans need easy and effective ways of specifying tasks to the robot. Goal images are one popular form of task specification, as they are already grounded in the robot's observation space. However, goal images also have a number of drawbacks: they are inconvenient for humans to provide, they can over-specify the desired behavior leading to a sparse reward signal, or under-specify task information in the case of non-goal reaching tasks. Natural language provides a convenient and flexible alternative for task specification, but comes with the challenge of grounding language in the robot's observation space. To scalably learn this grounding we propose to leverage offline robot datasets (including highly sub-optimal, autonomously collected data) with crowd-sourced natural language labels. With this data, we learn a simple classifier which predicts if a change in state completes a language instruction. This provides a language-conditioned reward function that can then be used for offline multi-task RL. In our experiments, we find that on language-conditioned manipulation tasks our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%, and is able to perform visuomotor tasks from natural language, such as "open the right drawer" and "move the stapler", on a Franka Emika Panda robot.

研究动机与目标

使通用机器人能够从自然语言指令中学习多样化的视觉-运动操纵任务。
解决使用可扩展的、非专家的数据采集方式，在高维机器人观测空间中实现语言的语境化。
开发一种方法，利用次优的、自主收集的离线数据与众包语言注释，实现高效的语言条件化策略学习。
通过支持灵活的、非目标到达的任务指定方式，提升泛化能力和稀疏奖励处理能力，从而超越基于目标图像和模仿学习的方法。

提出的方法

利用通过随机、脚本化或强化学习策略自主收集的、无动作标签的离线子最优机器人轨迹数据集。
通过众包（例如 Amazon Mechanical Turk）为每条轨迹标注其执行行为的自然语言描述。
训练一个二分类器，以预测某一状态转移（从初始图像到最终图像）是否满足给定的自然语言指令。
将训练好的分类器作为离线强化学习中的语言条件化奖励函数，用于多任务策略学习。
将学习到的奖励与视觉模型预测控制以及学习到的动力学模型结合，实现在真实机器人上的语言指定任务执行。
通过翻转初始和最终状态引入负样本，以提升时间一致性并防止过拟合。

实验结果

研究问题

RQ1能否从次优的、自主收集的离线数据中有效学习语言条件化的视觉-运动策略？
RQ2在无需专家遥控示范的情况下，对这类数据进行众包自然语言注释，是否能实现稳健的语言语境化？
RQ3与基于目标图像和模仿学习的方法相比，LOReL 的语言条件化奖励在成功率和泛化能力方面表现如何？
RQ4所学习的奖励在多大程度上能泛化到未见过的、重新表述的语言指令？
RQ5该方法能否在真实世界、长时程的操纵任务中实现高性能的自然语言指定？

主要发现

LOReL 在 Franka Emika Panda 机器人上对五个真实世界语言条件化任务实现了 66% 的平均成功率，包括 '打开右侧抽屉' 和 '移动订书机'。
移除负样本（翻转状态）后性能下降 30%，证实了其在学习时间进展方面的重要性。
在模拟环境中，LOReL 在语言条件化操纵任务中，性能优于语言条件化模仿学习和基于目标图像的基线方法超过 25%。
该方法对复杂重述具有鲁棒性，在 '完全打开左侧的小黑白色抽屉' 任务中成功率达 70%，在 '将小灰订书机在黑色桌面上推来推去' 任务中成功率达 50%。
使用预训练语言模型实现了对未见过的自然语言指令的零样本泛化，表明语言知识的有效迁移。
该方法无需最优轨迹或专家标注的动作即可在真实世界任务中实现高性能，展示了其可扩展性和实用性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。