QUICK REVIEW

[论文解读] Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition

Huy Thuc Ha, Pete Florence|arXiv (Cornell University)|Jul 26, 2023

Robot Manipulation and Learning被引用 12

一句话总结

本文提出一个框架，使用由LLM引导的数据生成管线，结合6DoF机器人原语和验证/重试机制，创建一个大规模带标签的数据集，随后蒸馏为一个多任务、语言条件的视觉-运动扩散策略，显示出提高的成功率和仿真-到-现实迁移。

ABSTRACT

We present a framework for robot skill acquisition, which 1) efficiently scale up data generation of language-labelled robot data and 2) effectively distills this data down into a robust multi-task language-conditioned visuo-motor policy. For (1), we use a large language model (LLM) to guide high-level planning, and sampling-based robot planners (e.g. motion or grasp samplers) for generating diverse and rich manipulation trajectories. To robustify this data-collection process, the LLM also infers a code-snippet for the success condition of each task, simultaneously enabling the data-collection process to detect failure and retry as well as the automatic labeling of trajectories with success/failure. For (2), we extend the diffusion policy single-task behavior-cloning approach to multi-task settings with language conditioning. Finally, we propose a new multi-task benchmark with 18 tasks across five domains to test long-horizon behavior, common-sense reasoning, tool-use, and intuitive physics. We find that our distilled policy successfully learned the robust retrying behavior in its data collection procedure, while improving absolute success rates by 33.2% on average across five domains. Code, data, and additional qualitative results are available on https://www.cs.columbia.edu/~huy/scalingup/.

研究动机与目标

利用 LLM 指导的任务规划和 6DoF 探索原语，扩大语言标注的机器人数据规模。
通过推断的成功函数和自动重试来增强数据收集的鲁棒性，以从失败中恢复。
使用扩散模型将收集的经验蒸馏成一个多任务、语言条件的 visuo-motor 策略。
引入一个新的18任务、5领域的基准测试，用于需要常识推理和工具使用的长时程操作。
通过领域随机化展示提高的成功率和对现实世界的迁移。

提出的方法

语言引导的数据生成：LLM 递归地将任务分解为子任务（任务树），并将它们落地到 6DoF 探索原语。
将计划落地为机器人效用调用，包括基于采样的运动规划和抓取/放置采样器。
LLM 推断的成功函数代码片段用于验证轨迹并驱动重试行为。
对成功轨迹进行鲁棒蒸馏，得到一个多任务、语言条件的 diffusion 策略，输入包括 CLIP 语言特征、本体感觉历史和两个 RGB 视图。
使用 DDIM 调度器实现高效的扩散式策略推理。
在基于 MuJoCo 构建的五个领域上的18个任务进行基准评估，评估长时程操控和领域泛化。

实验结果

研究问题

RQ1语言引导的数据生成是否能扩展自主任务导向的探索，以覆盖多样的 6DoF 操作任务？
RQ2语言条件扩散策略是否能从带标签的成功数据中有效学习多任务的 visuo-linguo-motor 策略？
RQ3验证与重试机制是否提高数据收集的鲁棒性以及下游策略的性能？
RQ4蒸馏后的策略通过领域随机化（Sim2Real）在现实世界中的迁移表现如何？

主要发现

蒸馏后的策略学会了鲁棒的重试行为，在五个领域的平均绝对成功率提升了 33.2%。
验证与重试在各领域提升性能；若没有重试（no-retry），性能可能显著下降（例如邮箱域的成功率为 0.0%）。
6DoF 探索使得处理复杂几何形状和关节物体成为可能，为蒸馏提供了多样化数据。
策略在五个新物体上的 Sim2Real 迁移达到约 76% 的成功率。
结合 LLM 指引的规划和成功推断的数据生成，提升了面向任务的探索，超出平面动作基线。
基于扩散的多任务策略，带语言条件，优于基于 MLP 的解码器和无重试基线。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。