QUICK REVIEW

[논문 리뷰] Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model

Paul F. Christiano, Zain Shah|arXiv (Cornell University)|2016. 10. 11.

Reinforcement Learning in Robotics참고 문헌 3인용 수 166

한 줄 요약

이 논문은 시뮬레이션에서 학습된 정책을 대상 도메인에서의 심층 역동학 모델을 학습하여 실제 세계로 전달하는 방법을 제시하며, 시뮬레이터를 사용해 다음 관찰을 예측하고 이에 따라 행동을 조정합니다.

ABSTRACT

Developing control policies in simulation is often more practical and safer than directly running experiments in the real world. This applies to policies obtained from planning and optimization, and even more so to policies obtained from reinforcement learning, which is often very data demanding. However, a policy that succeeds in simulation often doesn't work when deployed on a real robot. Nevertheless, often the overall gist of what the policy does in simulation remains valid in the real world. In this paper we investigate such settings, where the sequence of states traversed in simulation remains reasonable for the real world, even if the details of the controls are not, as could be the case when the key differences lie in detailed friction, contact, mass and geometry properties. During execution, at each time step our approach computes what the simulation-based control policy would do, but then, rather than executing these controls on the real robot, our approach computes what the simulation expects the resulting next state(s) will be, and then relies on a learned deep inverse dynamics model to decide which real-world action is most suitable to achieve those next states. Deep models are only as good as their training data, and we also propose an approach for data collection to (incrementally) learn the deep inverse dynamics model. Our experiments shows our approach compares favorably with various baselines that have been developed for dealing with simulation to real world model discrepancy, including output error control and Gaussian dynamics adaptation.

연구 동기 및 목표

강력한 소스 도메인 정책을 활용해 시뮬레이션-현실 차이에도 불구하고 대상 도메인(종종 실제)에서 잘 작동하도록 한다.
고수준 정책 행동은 이전 고정밀 제어의 세부 요소가 마찰, 접촉 및 기타 역학으로 인해 다를 수 있다는 점을 활용한다.
대상 도메인에서 행동을 조정하는 심층 역동학 모델(phi)을 학습하기 위한 온라인 데이터 수집 전략을 개발한다.
Sim1→Sim2 및 Sim→Real 실험을 통해 전달 효능을 시연하고, 접촉이 많은 작업을 포함한다.
모델 불일치를 출력 오류 제어 또는 가우시안 역학 적응으로 다루는 베이스라인과의 비교를 수행한다.

제안 방법

매 시간 단계에서 소스 도메인 행동 a_source = pi_source(tau_-k:).
다음 소스 도메인 관찰 o_next_hat = o(T_source(tau_-k:, a_source)).
학습된 역동학 모델 phi(tau_-k:, o_next_hat)를 사용해 대상 도메인 행동 a_target을 선택한다.
(oHistory, aHistory, o_next) 를 이전에 달성된 전이로 매핑하는 것을 목표로 phi를 학습한다.
역학의 시간적 의존성 및 잠재 인자를 포착하기 위해 히스토리 윈도우 H를 포함한다.
선별적 탐험 노이즈를 가진 예비 대상 도메인 정책을 실행하고 phi를 점진적으로 정교화하여 학습 데이터를 수집한다.

실험 결과

연구 질문

RQ1대상 도메인에서 학습된 심층 역동학 모델이 소스 도메인 정책의 효과적인 전달을 가능하게 할 수 있는가?
RQ2예측된 다음 관찰과 역동학 모델을 사용하는 것이 시뮬레이션-현실 전달에서 직접 정책 전달이나 순방향 역학 적응보다 우수한가요, 특히 접촉이 많은 역학에서?
RQ3히스토리 인식 역동학 학습이 데이터 효율성과 적응 성능에 어떤 영향을 미치는가?
RQ4다양한 역학 조건에서 제시된 방법의 성능은 출력 오류 제어 및 가우시안 역학 적응 베이스라인과 어떻게 비교되는가?
RQ5상태/관찰 적응 없이도 Robust Sim-to-Real 전달이 행동 적응만으로 충분한가?

주요 결과

제안된 방법은 도전적인 접촉이 풍부한 역학을 포함하여 시뮬레이션에서 실제 세계로의 매력적인 전달을 달성한다.
적응은 Sim1→Sim2 및 Sim→Real 설정에서 출력 오류 제어 및 가우시안 역학 적응과 같은 베이스라인 방법보다 우수하다.
역동학 모델에서 히스토리를 사용하면 데이터 요구량이 줄어들고 수렴이 향상된다.
목표 지향적이고 작업 관련 데이터 수집을 통해 학습하면 무작위 탐색보다 수렴 속도가 빠르다.
Sim→Real Fetch 실험에서 방법은 PD 베이스라인에 비해 시뮬레이션 궤적으로부터의 편차를 현저히 줄인다.
중력 및 모터 노이즈의 변화에 대해 효과가 유지되며 접촉으로 인한 불연속성도 처리한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.