QUICK REVIEW

[논문 리뷰] One-Shot Imitation Learning

Yan Duan, Marcin Andrychowicz|arXiv (Cornell University)|2017. 03. 21.

Domain Adaptation and Few-Shot Learning참고 문헌 40인용 수 227

한 줄 요약

논문은 한 샷 모방 학습을 위한 메타 학습 접근법을 소개하여, 시演 하나로 새로운 작업을 따라 하는 뉴럴 정책을 시演에 조건을 걸고 소프트 어텐션을 사용하여 보지 않은 작업들에 일반화하게 합니다.

ABSTRACT

Imitation learning has been commonly applied to solve different tasks in isolation. This usually requires either careful feature engineering, or a significant number of samples. This is far from what we desire: ideally, robots should be able to learn from very few demonstrations of any given task, and instantly generalize to new situations of the same task, without requiring task-specific engineering. In this paper, we propose a meta-learning framework for achieving such capability, which we call one-shot imitation learning. Specifically, we consider the setting where there is a very large set of tasks, and each task has many instantiations. For example, a task could be to stack all blocks on a table into a single tower, another task could be to place all blocks on a table into two-block towers, etc. In each case, different instances of the task would consist of different sets of blocks with different initial states. At training time, our algorithm is presented with pairs of demonstrations for a subset of all tasks. A neural net is trained that takes as input one demonstration and the current state (which initially is the initial state of the other demonstration of the pair), and outputs an action with the goal that the resulting sequence of states and actions matches as closely as possible with the second demonstration. At test time, a demonstration of a single instance of a new task is presented, and the neural net is expected to perform well on new instances of this new task. The use of soft attention allows the model to generalize to conditions and tasks unseen in the training data. We anticipate that by training this model on a much greater variety of tasks and settings, we will obtain a general system that can turn any demonstrations into robust policies that can accomplish an overwhelming variety of tasks. Videos available at https://bit.ly/nips2017-oneshot .

연구 동기 및 목표

정책이 potentially 무한한 작업 분포에서 하나의 시演으로 새로운 작업을 학습하도록 한다.
정책이 (시演, 현재 관측) 를 보지 않은 작업의 행동으로 매핑하는 학습 프레임워크를 개발한다.
어텐션 메커니즘이 다변하는 작업 구성과 객체 수에서 일반화가 가능함을 보여준다.

제안 방법

입력 시演 d와 현재 관측 o에 조건화된 정책 π(a|o, d)를 형식화한다.
한 작업 분포의 시演으로 학습시켜 하나의 시演이 같은 작업의 새로운 인스턴스에서의 행동을 안내하게 한다.
긴 시演을 다운샘플링하고 일반화를 개선하기 위해 시간 차단을 사용한다.
블록 위치에 대한 이웃 어텐션을 적용하여 블록을 관계화하고 관련 맥락 정보를 추출한다.
세 가지 모듈 아키텍처를 사용한다: Demonstration Network, Context Network, Manipulation Network.
가변 길이의 시演과 가변 개체 수를 처리하기 위해 소프트 어텐션(및 멀티헤드 어텐션)을 활용한다.

실험 결과

연구 질문

RQ1새로운 작업의 단일 시演이 이 작업의 보지 않은 인스턴스에서 견고한 정책 실행을 가능하게 하는가?
RQ2전체 시演에 조건을 거는 것이 마지막 상태나 한정된 궤적 스냅샷에 조건을 거는 것보다 성능이 우수한가?
RQ3이 한 샷 모방 설정에서 행동 복제가 DAGGER와 동등하거나 우수한가?
RQ4훈련 중 보강 학습 없이도 모델이 학습 중에 보지 못한 작업으로 일반화할 수 있는 정도는 어느 정도인가?

주요 결과

한 샷 모방 접근 방식은 단일 시演 후에 새로운 작업 인스턴스에서 정책이 잘 작동하도록 한다.
작업 난이도(스테이지)가 증가함에 따라 전체 시演에 조건을 거는 것이 최종 상태에 조건을 거는 것보다 성능이 더 나아지기 시작한다.
시演 다운샘플링을 포함한 시간 차단은 일반화를 개선하고 정규화 효과를 가진다.
이 설정에서 행동 복제는 DAGGER와 비교해 비슷한 성능을 보여 상호 감독이 필요하지 않을 수 있음을 시사한다.
어텐션 시각화는 모델이 작업 단계에 해당하는 소수의 블록과 핵심 프레임에 집중하는 것을 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.