QUICK REVIEW

[논문 리뷰] Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning

Yuanpei Chen, Tianhao Wu|arXiv (Cornell University)|2022. 06. 17.

Reinforcement Learning in Robotics인용 수 29

한 줄 요약

본 논문은 Isaac Gym에서 구축된 두 손의 자유로운 조작 벤치마크 Bi-DexHands를 소개하고, 인간 수준의 이중 수작용 능력을 평가하기 위해 20개가 넘는 작업에서 단일 에이전트, MARL, 오프라인, 다중 작업, 메타- RL 등을 벤치마킹합니다. 또한 단순한 작업에서 PPO 기반 방법이 가장 강력하다고 강조하고 다중 작업 및 소샷 일반화의 도전에 주목합니다.

ABSTRACT

Achieving human-level dexterity is an important open problem in robotics. However, tasks of dexterous hand manipulation, even at the baby level, are challenging to solve through reinforcement learning (RL). The difficulty lies in the high degrees of freedom and the required cooperation among heterogeneous agents (e.g., joints of fingers). In this study, we propose the Bimanual Dexterous Hands Benchmark (Bi-DexHands), a simulator that involves two dexterous hands with tens of bimanual manipulation tasks and thousands of target objects. Specifically, tasks in Bi-DexHands are designed to match different levels of human motor skills according to cognitive science literature. We built Bi-DexHands in the Issac Gym; this enables highly efficient RL training, reaching 30,000+ FPS by only one single NVIDIA RTX 3090. We provide a comprehensive benchmark for popular RL algorithms under different settings; this includes Single-agent/Multi-agent RL, Offline RL, Multi-task RL, and Meta RL. Our results show that the PPO type of on-policy algorithms can master simple manipulation tasks that are equivalent up to 48-month human babies (e.g., catching a flying object, opening a bottle), while multi-agent RL can further help to master manipulations that require skilled bimanual cooperation (e.g., lifting a pot, stacking blocks). Despite the success on each single task, when it comes to acquiring multiple manipulation skills, existing RL algorithms fail to work in most of the multi-task and the few-shot learning settings, which calls for more substantial development from the RL community. Our project is open sourced at https://github.com/PKU-MARL/DexterousHands.

연구 동기 및 목표

두 Shadow Hands를 사용한 확장 가능하고 고해상도의 이중 수작용 벤치마크를 설계하고 제공한다.
다양한 작업에서 단일 에이전트, MARL, 오프라인 RL, 다중 작업 RL, 메타 RL 등 광범위한 RL 구성을 평가한다.
정교한 조작 작업에서 일반화, 다중 작업 학습, 소샷 적응을 분석한다.
작업 난이도를 인간의 운동 발달과 연계하여 인지 및 기술 인식 기반 벤치마킹을 안내한다.

제안 방법

Isaac Gym의 두 Shadow Hands는 다중 에이전트 설정과 단일 에이전트 케이스를 위한 분산 부분 관찰 가능 MDP(Dec-POMDP)를 형성한다.
다양한 장면을 만들기 위해 YCB와 SAPIEN의 물체를 포함한 데이터셋 및 작업 모음을 제공한다.
작업을 영아 미세 운동 하위검사(FMS) 연령에 매핑하여 난이도(easy/medium/hard)를 구조화한다.
20개 작업에 걸쳐 on-policy PPO 기반 방법(PPO, HAPPO/HATRPO)과 MARL 방법(MAPPO, IPPO, MADDPG)을 벤치마크한다.
무작위, 재생, 중간, 중간-전문가 데이터셋을 포함한 BC, BCQ, TD3+BC, IQL 같은 오프라인 RL 베이스라인을 포함한다.
작업-ID 조건부와 메타 학습 목표를 사용하여 다중 작업 및 메타-RL(MT1/ML1, MT4/ML4, MT20/ML20)을 탐색한다.

실험 결과

연구 질문

RQ1표준 및 확장된 RL 알고리즘이 광범위한 작업 세트에서 인간과 유사한 양손 정교함을 학습할 수 있는가?
RQ2손 간 협력이 필요한 작업에서 단일 에이전트와 다중 에이전트 RL은 어떻게 비교되는가?
RQ3이중 손 조작에서 성능과 일반화에 대한 오프라인, 다중 작업, 메타 RL의 영향은 무엇인가?
RQ4인간의 운동 발달에서 영감을 받은 작업 난이도가 연령 모방 작업에서 RL 성능과 어떻게 상관되는가?
RQ5학습한 기술을 실제 로봇과 변형 가능한 물체로 옮겨가는 한계와 향후 방향은 무엇인가?

주요 결과

PPO 기반 on-policy 방법은 더 간단한 이중 손 기술을 포함한 많은 작업에서 강한 성능을 달성한다.
다중 에이전트 RL은 조정된 양손 협력이 필요한 작업에서 성능을 향상시키고, 더 어려운 작업에서 PPO와의 격차를 좁힌다.
SAC는 이 설정에서 많은 작업에서 성능이 좋지 않으며, 이는 오프 폴리시 불안정성과 고차원 입력 때문일 가능성이 있다.
오프라인 RL 결과는 분포 밖 행동으로 인한 가치 오류를 드러내며 Bi-DexHands를 도전적인 오프라인 벤치마크로 부각시킨다.
다중 작업/메타-RL을 통한 작업 간 일반화는 일관되게 성공적이지 않으며, 알고리즘 개발 여지가 크다.
작업 연령이 증가하면(더 어려운 작업일수록) RL 성능은 일반적으로 저하되며, 인간 운동 발달에 기반한 설계 난이도와 합리적으로 맞물리는 경향을 반영한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.