QUICK REVIEW

[논문 리뷰] Deep Reinforcement Learning for Robotic Manipulation-The state of the art

Smruti Amarjyoti|arXiv (Cornell University)|2017. 01. 31.

Reinforcement Learning in Robotics참고 문헌 20인용 수 57

한 줄 요약

로봇 조작을 위한 DRL 방법을 action space( DAS vs CAS ) 및 정책 표현(SCAS vs DCAS)으로 정리한 설문으로, 주요 알고리즘, 아키텍처, 현실 세계와 시뮬레이션 구현을 상세히 다룬다.

ABSTRACT

The focus of this work is to enumerate the various approaches and algorithms that center around application of reinforcement learning in robotic ma- ]]nipulation tasks. Earlier methods utilized specialized policy representations and human demonstrations to constrict the policy. Such methods worked well with continuous state and policy space of robots but failed to come up with generalized policies. Subsequently, high dimensional non-linear function approximators like neural networks have been used to learn policies from scratch. Several novel and recent approaches have also embedded control policy with efficient perceptual representation using deep learning. This has led to the emergence of a new branch of dynamic robot control system called deep r inforcement learning(DRL). This work embodies a survey of the most recent algorithms, architectures and their implementations in simulations and real world robotic platforms. The gamut of DRL architectures are partitioned into two different branches namely, discrete action space algorithms(DAS) and continuous action space algorithms(CAS). Further, the CAS algorithms are divided into stochastic continuous action space(SCAS) and deterministic continuous action space(DCAS) algorithms. Along with elucidating an organ- isation of the DRL algorithms this work also manifests some of the state of the art applications of these approaches in robotic manipulation tasks.

연구 동기 및 목표

전통적인 손으로 설계된 정책보다 로봇 조작에 DRL 사용의 필요성과 동기를 부여한다.
DRL 접근법을 이산 행동 공간과 연속 행동 공간, 그리고 확률적 정책과 결정론적 정책으로 정리한다.
딥러닝이 엔드 투 엔드 비주얼-모터 제어와 정책 표현을 어떻게 가능하게 하는지 설명한다.
시뮬레이션에서 실제로의 전이, 학습 안정성, 샘플 효율성과 관련된 실용적 고려사항을 강조한다.

제안 방법

DRL 알고리즘을 이산 행동 공간(DAS)과 연속 행동 공간(CAS)으로 분류한다.
CAS를 확률적 연속 행동 공간(SCAS)과 결정론적 연속 행동 공간(DCAS)으로 세분화한다.
핵심 알고리즘(DQN, Double DQN, Dueling Networks, NAF, 정책 기울기 변형, TRPO, DDPG)과 로봇공학에의 적용 가능성을 설명한다.
깊은 네트워크를 이용한 시각-운동 제어 및 학습 안정화를 위한 경험 재현을 논의한다.
CNN 기반 정책, actor-critic 아키텍처, 병렬/비동기 학습을 포함한 구현 측면을 요약한다.

실험 결과

연구 질문

RQ1이산 versus 연속 행동 공간에서 로봇 조작에 가장 효과적인 DRL 알고리즘과 아키텍처는 무엇인가?
RQ2정책 표현(가치 기반, 정책 기반, 액터-크리틱)이 실시간 로봇 조작 작업에서 어떻게 수행되는가?
RQ3비전 입력에서의 학습 및 시뮬레이션에서 실제 로봇으로의 전이에서 어떤 도전과 해결책이 있는가?
RQ4로봇 공학을 위한 DRL의 샘플 효율성과 학습 안정성을 향상시키는 방법은 무엇인가?
RQ5복잡한 조작 작업에서의 전달 학습 및 보상 설계의 공백은 무엇이며 이를 어떻게 보완할 수 있는가?

주요 결과

DAS 방법(DAS 예: DQN 변형)은 이산 행동 로봇공학 태스크에 적합하지만 연속 행동 공간에서는 도전에 직면한다.
CAS 방법(정책 탐색, 액터-크리틱)은 연속 로봇 제어에 더 자연스러우며, DDPG가 주요 결정론적 정책 기울기 접근법으로 작용한다.
NAF 및 DDPG는 연속 제어 태스크와 실시간 로봇 조작에 강력한 성능을 보인다.
경험 재현과 타깃 네트워크는 시각 기반 로봇 제어의 DRL 학습을 안정화한다.
비동기적 및 병렬 데이터 수집은 로봇 수에 따라 학습 시간을 크게 줄이고 샘플 효율성을 개선한다.
전이 학습 및 보상 설계의 격차를 식별하며, 시간 추상화를 위한 역강화학습 및 내재적 동기에 대한 추가 연구를 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.