QUICK REVIEW

[論文レビュー] Waypoint-Based Imitation Learning for Robotic Manipulation

Lucy Xiaoyang Shi, Archit Sharma|arXiv (Cornell University)|Jul 26, 2023

Robot Manipulation and Learning被引用数 8

ひとこと要約

論文は自動ウェイポイント抽出（AWE）を紹介する。デモから再構成予算付き線形補間を用いて最小限のウェイポイントを自動選択する前処理手法。AWEは行動クローン（BC）に組み込むことができ、拡散ポリシーとACTを改善し、シミュレーションの成功率を最大25%、実世界の両手操作タスクで4–28%向上させ、意思決定の horizon を短縮する。

ABSTRACT

While imitation learning methods have seen a resurgent interest for robotic manipulation, the well-known problem of compounding errors continues to afflict behavioral cloning (BC). Waypoints can help address this problem by reducing the horizon of the learning problem for BC, and thus, the errors compounded over time. However, waypoint labeling is underspecified, and requires additional human supervision. Can we generate waypoints automatically without any additional human supervision? Our key insight is that if a trajectory segment can be approximated by linear motion, the endpoints can be used as waypoints. We propose Automatic Waypoint Extraction (AWE) for imitation learning, a preprocessing module to decompose a demonstration into a minimal set of waypoints which when interpolated linearly can approximate the trajectory up to a specified error threshold. AWE can be combined with any BC algorithm, and we find that AWE can increase the success rate of state-of-the-art algorithms by up to 25% in simulation and by 4-28% on real-world bimanual manipulation tasks, reducing the decision making horizon by up to a factor of 10. Videos and code are available at https://lucys0.github.io/awe/

研究の動機と目的

模倣学習の連鎖誤差を削減するため、自動ウェイポイント選択によってBCの意思決定 horizon を短縮する。
デモから proprioceptive データのみに依存するゼロ-supervision のウェイポイント抽出を提供する。
AWE の互換性を最先端の BC 手法および実世界ロボットタスクで実証する。

提案手法

真の軌道とウェイポイントを用いた線形補間による再構成を最大 proprioceptive 距離として再構成損失 L を定義する。
L(f(W), tau) <= eta となるよう最小数のウェイポイント W を動的計画法で選択する。
訓練データを次のウェイポイントで再ラベル付けするようデモを前処理し、BC が生の行動ではなくウェイポイントを予測できるようにする。
AWE を拡散ポリシーおよびTransformerを用いたアクションク chunks（ACT）と組み合わせて、シミュレーションと実タスクで性能を評価する。
ポリシー表現力や誤差予算 eta がウェイポイント数と性能へ与える影響など、実務的な考慮事項を議論する。

Figure 1: Our approach reduces the horizon of imitation learning by extracting waypoints from demonstrations.

実験結果

リサーチクエスチョン

RQ1AWE は長期的な操作タスクにおける代表的な BC 手法の性能を向上させるか。
RQ2AWE はシミュレーションベンチマークや実機ロボットでの実デモから効果的な学習を可能にするか。
RQ3誤差予算 eta とポリシー表現力は AWE の利点にどう影響するか。
RQ4AWE はタスクを越えて拡散ベースおよびトランスフォーマー基盤の BC アーキテクチャと相補的か。
RQ5ウェイポイント抽出のために proprioceptive 信号だけに依存することの限界は何か。

主な発見

AWE + ACT は、シミュレーションの二つの両手操作タスクで ACT より顕著に性能を改善（最大で 25% の成功率向上）、実世界タスクでも Screwdriver Handover、Wiping Table、Coffee Making で ACT を上回る 8–28% の改善。
RoboMimic タスク全体で、デモ数が 30 から 200 にスケールするにつれて AWE は拡散ポリシーの性能を一貫して向上させ、長期的なタスクで顕著な利得（例: 30 デモで Square 18%）を示す。
AWE は有効訓練 horizon を 7×〜10×削減し、軌道の多くの部分で低レベル制御を線形補間ベースのセグメントで駆動できるようにする。
実機実験では、三つの巧妙なタスクで成功率を向上させ、Coffee Making で最大 28% の改善を示し、Screwdriver Handover および Wiping the Table で一貫した利益を示す。
AWE の利点はウェイポイントラベリングによって生じる多峰性を扱う表現力豊かなポリシークラス（例：GMM）を用いることに依存する；単峰性 BC では AWE によって劣化する可能性がある。

Figure 2: Visualizing the loss $\mathcal{L}$ .

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。