QUICK REVIEW

[论文解读] Solving Rubik's Cube with a Robot Hand

OpenAI, Ilge Akkaya|arXiv (Cornell University)|Oct 16, 2019

Domain Adaptation and Few-Shot Learning参考文献 111被引用 630

一句话总结

该论文完全在仿真中使用自动域随机化（ADR）来训练控制策略和视觉状态估计器，以解决 Rubik’s Cube 使用五指人形手，演示了有效的 sim2real 迁移。

ABSTRACT

We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by two key components: a novel algorithm, which we call automatic domain randomization (ADR) and a robot platform built for machine learning. ADR automatically generates a distribution over randomized environments of ever-increasing difficulty. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer. For control policies, memory-augmented models trained on an ADR-generated distribution of environments show clear signs of emergent meta-learning at test time. The combination of ADR with our custom robot platform allows us to solve a Rubik's cube with a humanoid robot hand, which involves both control and state estimation problems. Videos summarizing our results are available: https://openai.com/blog/solving-rubiks-cube/

研究动机与目标

仅使用仿真数据进行训练，演示用五指人形手解决 Rubik’s Cube。
引入自动域随机化（ADR），为控制策略和视觉模型生成日益增长且多样化的训练环境。
研究为何 ADR 训练的策略能迁移到真实硬件，以及是否出现新兴元学习。
构建一个机器人平台及配套的仿真管线，紧密建模涉及的操作与感知任务。

提出的方法

开发一种新颖的 ADR 算法，逐步扩展对随机化仿真环境的分布。
通过强化学习训练一个带记忆增强的控制策略（基于 LSTM），以解决随机化任务。
使用由 ADR 生成的仿真渲染图像训练基于视觉的 Rubik’s Cube 姿态估计器。
建模详细的 MuJoCo 基于 Shadow Dexterous Hand 的仿真与一个3D Rubik’s cube，以缩小仿真到现实的差距。
使用分布式、以 ADR 为驱动的训练管线，集中存储（Redis）的参数、数据和性能缓冲区。

实验结果

研究问题

RQ1仅在仿真中使用 ADR 训练的模型能否有效转移到现实世界的人形手对 Rubik’s Cube 的操作？
RQ2在 ADR 下进行记忆增强的策略训练，在现实世界部署时是否表现出新兴的元学习？
RQ3自动域随机化与手动域随机化在实现稳健的 sim2real 迁移方面有何差异？
RQ4在物理与仿真平台上的关键设计考量有哪些，以支持复杂操作任务的 ADR？

主要发现

ADR 使涉及 Rubik’s Cube 和五指手的复杂操作任务实现了成功的 sim2real 迁移。
在扩展的 ADR 分布下训练的带记忆增强策略在测试时显示出新兴元学习的迹象。
在 ADR 下训练的视觉状态估计器能够从现实世界的 RGB 摄像头输入预测立方体的姿态和面角。
系统性地增强仿真现实性（手部动力学、立方体模型、传感器噪声）可提高迁移表现。
专用的机器人平台与可扩展的分布式 ADR 训练管线支持高效的训练与评估。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。