QUICK REVIEW

[论文解读] Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning

Yuanpei Chen, Tianhao Wu|arXiv (Cornell University)|Jun 17, 2022

Reinforcement Learning in Robotics被引用 29

一句话总结

本文介绍 Bi-DexHands，一套在 Isaac Gym 中构建的双手灵巧操作基准，并在 20+ 个任务上对多种 RL 形式（单智能体、MARL、离线、多任务、元学习）进行基准测试，以评估人類水平的双手协作灵巧性。它强调基于 PPO 的方法在简单任务上表现最强，并指出多任务和 few-shot 泛化方面的挑战。

ABSTRACT

Achieving human-level dexterity is an important open problem in robotics. However, tasks of dexterous hand manipulation, even at the baby level, are challenging to solve through reinforcement learning (RL). The difficulty lies in the high degrees of freedom and the required cooperation among heterogeneous agents (e.g., joints of fingers). In this study, we propose the Bimanual Dexterous Hands Benchmark (Bi-DexHands), a simulator that involves two dexterous hands with tens of bimanual manipulation tasks and thousands of target objects. Specifically, tasks in Bi-DexHands are designed to match different levels of human motor skills according to cognitive science literature. We built Bi-DexHands in the Issac Gym; this enables highly efficient RL training, reaching 30,000+ FPS by only one single NVIDIA RTX 3090. We provide a comprehensive benchmark for popular RL algorithms under different settings; this includes Single-agent/Multi-agent RL, Offline RL, Multi-task RL, and Meta RL. Our results show that the PPO type of on-policy algorithms can master simple manipulation tasks that are equivalent up to 48-month human babies (e.g., catching a flying object, opening a bottle), while multi-agent RL can further help to master manipulations that require skilled bimanual cooperation (e.g., lifting a pot, stacking blocks). Despite the success on each single task, when it comes to acquiring multiple manipulation skills, existing RL algorithms fail to work in most of the multi-task and the few-shot learning settings, which calls for more substantial development from the RL community. Our project is open sourced at https://github.com/PKU-MARL/DexterousHands.

研究动机与目标

设计并提供一个可扩展的高保真双手灵巧操作基准，使用两只 Shadow Hands。
在多样化任务上评估广泛的 RL 形式（单智能体、MARL、离线 RL、多任务 RL、元 RL）。
分析在灵巧操作任务中的泛化、多任务学习和 few-shot 适应性。
将任务难度与人类运动发展关联起来，以指导具认知和技能感知的基准测试。

提出的方法

在 Isaac Gym 中的两只 Shadow Hands 构成一个去中心化部分可观测马尔可夫决策过程（Dec-POMDP），用于多智能体设置和单智能体情况。
提供一个包含 YCB 与 SAPIEN 物体的数据集和任务集合，以创建多样化场景。
将任务映射到婴儿精细运动子测验（FMS）年龄，以结构化任务难度（简单/中等/困难）。
在 20 个任务上对基于 PPO 的 on-policy 方法（PPO, HAPPO/HATRPO）和 MARL 方法（MAPPO, IPPO, MADDPG）进行基准测试。
包含离线 RL 基线（BC, BCQ, TD3+BC, IQL），其数据集包括 random、replay、medium 与 medium-expert。
使用任务 ID 条件化和元学习目标，探索多任务和 meta-RL（MT1/ML1, MT4/ML4, MT20/ML20）。

实验结果

研究问题

RQ1标准和扩展的 RL 算法是否能够在广泛任务集合上学习类似于人类的双手协作灵巧性？
RQ2在需要双手协作的任务中，单智能体与多智能体 RL 的表现如何比较？
RQ3离线、多任务和元 RL 对双手操作的性能和泛化有什么影响？
RQ4受人类运动发展启发的任务难度与跨年龄模拟任务的 RL 性能之间有何相关？
RQ5将学得的技能转移到真实机器人和可变形物体上的局限性与未来方向是什么？

主要发现

基于 PPO 的 on-policy 方法在许多任务上取得强劲表现，包括较简单的双手技能。
多智能体 RL 在需要协调双手协作的任务上表现提升，缩小了在更困难任务上与 PPO 的差距。
SAC 在此设置下在许多任务上表现不佳，可能原因是离策略不稳定性和高维输入。
离线 RL 结果揭示来自分布外行动的价值误差，并强调 Bi-DexHands 作为具挑战性的离线基准。
在多任务/元 RL 的跨任务泛化并非始终成功，表明还有大量算法发展的空间。
随着任务年龄增加（任务越难），RL 表现通常下降，反映了与基于人类运动发展设计的难度相协调的合理趋势。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。