QUICK REVIEW

[論文レビュー] Transporter Networks: Rearranging the Visual World for Robotic Manipulation

Andy Zeng, Pete Florence|arXiv (Cornell University)|Oct 27, 2020

Robot Manipulation and Learning被引用数 100

ひとこと要約

Transporter Networks は深層視覚特徴を再配置して操作のための空間的変位を予測することを学習し、オブジェクト中心の表現を用いずに高いサンプル効率を達成し、未見の物体やマルチステップ課題へ一般化します。

ABSTRACT

Robotic manipulation can be formulated as inducing a sequence of spatial displacements: where the space being moved can encompass an object, part of an object, or end effector. In this work, we propose the Transporter Network, a simple model architecture that rearranges deep features to infer spatial displacements from visual input - which can parameterize robot actions. It makes no assumptions of objectness (e.g. canonical poses, models, or keypoints), it exploits spatial symmetries, and is orders of magnitude more sample efficient than our benchmarked alternatives in learning vision-based manipulation tasks: from stacking a pyramid of blocks, to assembling kits with unseen objects; from manipulating deformable ropes, to pushing piles of small objects with closed-loop feedback. Our method can represent complex multi-modal policy distributions and generalizes to multi-step sequential tasks, as well as 6DoF pick-and-place. Experiments on 10 simulated tasks show that it learns faster and generalizes better than a variety of end-to-end baselines, including policies that use ground-truth object poses. We validate our methods with hardware in the real world. Experiment videos and code are available at https://transporternets.github.io

研究の動機と目的

明示的なオブジェクト表現に依存せず、サンプル効率の高いエンドツーエンドのビジョンベース操作を動機づける。
aTransporter Network アーキテクチャを提案する空間構造を保持し、特徴の輸送を用いて変位を予測する。
未見の物体、多段階タスク、および 6DoF 配置への強い一般化を示す。
可変形物や物体の山積みを含む多様なテーブルトップタスクにおいて、エンドツーエンドのベースラインよりも優れたサンプル効率を示す。
実機ロボットへのデプロイメントへの道と、ビジョンベースの操作のオープンソースベンチマークを提供する。

提案手法

操作を2つのポーズプリミティブに分解し、ピックとプレースを局所的な視覚クロップの輸送として学習する。
RGB-D を 3D 空間へアンプロジェクションして幾何を保持する直交投影で、空間的一貫性のある 3D 表現を用いる。
完全畳み込みネットワーク（FCN）を用いて密度の高い、画素単位の Q マップをモデル化する。
ピックを条件としたプレースを、ピックの周辺を輸送してシーン特徴との相互相関を実行し、配置姿勢を選択する（密な埋め込みによるテンプレートマッチング）。
SE(2) から SE(3) への拡張を、離散-連続ハイブリッドアプローチで実現: 残りの回転・並進に対して3チャンネル回帰を用いた離散 SE(2) 配置、マルチモーダルで連続的な GR レジームを可能にする。
デモンストレーションからエンドツーエンドで訓練し、ピックとプレースの画素-wise ラベルに対するシンプルなクロスエントロピーロスを用い、連続自由度に対する回帰損失を任意で追加する。

実験結果

リサーチクエスチョン

RQ1Can Transporter Networks achieve high sample efficiency without object-centric representations in vision-based manipulation?
RQ2How well do Transporter Networks generalize to unseen objects, configurations, and multi-step tasks (including 6DoF placement)?
RQ3What is the impact of preserving spatial structure and using feature transport on learning multi-modal action distributions?
RQ4How does Transporter Networks perform relative to object-centric baselines and purely end-to-end models across a range of tabletop tasks?
RQ5Can the framework handle deformables and piles of objects with closed-loop visual feedback?

主な発見

Transporter Networks は高いサンプル効率を達成し、画像ベースのベースラインよりも桁違いに少ないデータ量で高い成功を達成することが多い。
この手法は未見の物体と構成に一般化し、SE(2)/SE(3) のハイブリッドアプローチによる 6DoF 配置を含む。
マルチモーダルな空間的行動分布をモデル化し、閉ループ視覚フィードバックを伴う多段階逐次タスクをサポートする。
10 のテーブルトップタスクのシミュレーションで、Transporter Networks は Form2Fit や ConvMLP などのベースラインを、さまざまなデータレジーム（1, 10, 100, 1000 デモ）で上回る。
このアプローチは実ロボットにも実用的で、可変形物や小さな物体の山積みに対する拡張をサポートする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。