QUICK REVIEW

[論文レビュー] One-Shot Imitation Learning

Yan Duan, Marcin Andrychowicz|arXiv (Cornell University)|Mar 21, 2017

Domain Adaptation and Few-Shot Learning参考文献 40被引用数 227

ひとこと要約

本論文はワンショット模倣学習のメタ学習アプローチを提案し、単一のデモンストレーションを条件付けして新しいタスクを模倣できるようにし、未見のタスク全体で一般化するためにソフトアテンションを利用する。

ABSTRACT

Imitation learning has been commonly applied to solve different tasks in isolation. This usually requires either careful feature engineering, or a significant number of samples. This is far from what we desire: ideally, robots should be able to learn from very few demonstrations of any given task, and instantly generalize to new situations of the same task, without requiring task-specific engineering. In this paper, we propose a meta-learning framework for achieving such capability, which we call one-shot imitation learning. Specifically, we consider the setting where there is a very large set of tasks, and each task has many instantiations. For example, a task could be to stack all blocks on a table into a single tower, another task could be to place all blocks on a table into two-block towers, etc. In each case, different instances of the task would consist of different sets of blocks with different initial states. At training time, our algorithm is presented with pairs of demonstrations for a subset of all tasks. A neural net is trained that takes as input one demonstration and the current state (which initially is the initial state of the other demonstration of the pair), and outputs an action with the goal that the resulting sequence of states and actions matches as closely as possible with the second demonstration. At test time, a demonstration of a single instance of a new task is presented, and the neural net is expected to perform well on new instances of this new task. The use of soft attention allows the model to generalize to conditions and tasks unseen in the training data. We anticipate that by training this model on a much greater variety of tasks and settings, we will obtain a general system that can turn any demonstrations into robust policies that can accomplish an overwhelming variety of tasks. Videos available at https://bit.ly/nips2017-oneshot .

研究の動機と目的

ポリシーが潜在的に無限に広がるタスク分布の中で1つのデモンストレーションから新しいタスクを学習できるようにする。
デモンストレーションと現在の観測を入力として、未見のタスクに対する行動を出力するようにポリシーをマッピングする訓練フレームワークを開発する。
注意機構が、タスクの構成や物体の数が異なる状況に対して一般化を可能にすることを示す。

提案手法

入力デモンストレーション d と現在の観測 o に条件付けられたポリシー pi(a|o, d) を定式化する。
同じタスクの新しい实例に対して1つのデモが行動を導くよう、タスク分布からのデモンストレーションで訓練する。
長いデモンストレーションをサンプリングするために時系列ドロップアウトを用い、一般化を改善する。
ブロック位置に対する近傍アテンションを適用してブロック間の関連を捉え、関連する文脈情報を抽出する。
Demonstration Network、Context Network、Manipulation Network の3モジュール構成を採用する。
可変長デモンストレーションと可変数の物体数を処理するためにソフトアテンション（およびマルチヘッドアテンション）を利用する。

実験結果

リサーチクエスチョン

RQ1新しいタスクの単一デモンストレーションが、未見のそのタスクのインスタンスで堅牢なポリシー実行を可能にするか？
RQ2全デモンストレーションでの条件付けは、最終状態のみまたは軌跡の限定的なスナップショットでの条件付けよりも優れているか？
RQ3このワンショット模倣設定において、ビヘイビアラル・クローンを用いた訓練はDAGGERと同等か、それに対抗できるか？
RQ4ブロック積み domain 内で訓練中に見られなかったタスクへ、モデルはどの程度一般化できるか？

主な発見

ワンショット模倣アプローチにより、デモンストレーション1つの後で新規タスクのインスタンスで良好に動作するポリシーを実現できる。
全デモンストレーションでの条件付けは、タスクの難易度（段階）が上がるにつれて最終状態での条件付けを上回り始める。
デモンストレーションのダウンサンプリングを伴う時系列ドロップアウトは一般化を向上させ、正則化として働く。
この設定ではビヘイビアラル・クローンがDAGGERと同等の性能を示し、対話的監視は必須ではない可能性を示唆する。
アテンションの可視化は、モデルがブロックの小さなサブセットとタスク段階に対応する重要フレームに焦点を当てていることを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。