QUICK REVIEW

[论文解读] Sim-to-Real Robot Learning from Pixels with Progressive Nets

Andrei A. Rusu, Matej Vecerík|arXiv (Cornell University)|Oct 13, 2016

Reinforcement Learning in Robotics被引用 109

一句话总结

本文展示了使用渐进网络将端到端、像素到动作的策略从仿真转移到真实机器人，在稀疏奖励的情况下实现对真实硬件的快速学习。

ABSTRACT

Applying end-to-end learning to solve complex, interactive, pixel-driven control tasks on a robot is an unsolved problem. Deep Reinforcement Learning algorithms are too slow to achieve performance on a real robot, but their potential has been demonstrated in simulated environments. We propose using progressive networks to bridge the reality gap and transfer learned policies from simulation to the real world. The progressive net approach is a general framework that enables reuse of everything from low-level visual features to high-level policies for transfer to new tasks, enabling a compositional, yet simple, approach to building complex skills. We present an early demonstration of this approach with a number of experiments in the domain of robot manipulation that focus on bridging the reality gap. Unlike other proposed approaches, our real-world experiments demonstrate successful task learning from raw visual input on a fully actuated robot manipulator. Moreover, rather than relying on model-based trajectory optimisation, the task learning is accomplished using only deep reinforcement learning and sparse rewards.

研究动机与目标

激发并解决通过深度强化学习学习的端到端像素到动作机器人控制中的现实差距。
提出渐进网络作为迁移学习框架，以在跨任务和跨域中重用学习到的特征和策略。
通过真实机器人实验展示，渐进网络可在带稀疏奖励的全执行作动机器人手臂上加速学习。

提出的方法

使用在仿真中训练的带RGB输入和关节速度输出的 actor-critic 网络。
为真实机器人任务实例化一个新的列（网络），并从仿真列建立横向连接。
将真实机器人输出层初始化为镜像仿真列，以偏置探索。
允许各列具有不同的容量，以适应仿真到现实的差异。
在多任务和扰动下进行评估，以比较渐进迁移、微调和从零开始学习。
通过添加一个使用本体感知的列，同时通过横向连接重用视觉特征，演示扩展到本体感知输入的能力。

实验结果

研究问题

RQ1当使用像素输入和稀疏奖励训练时，渐进网络能否将从仿真学得的策略迁移到真实机器人？
RQ2相比微调或从零开始学习，渐进网络是否能实现更快速且更稳定的真实机器人学习？
RQ3在渐进网络框架中，增加或改变输入模态（例如本体感知）对迁移性能有何影响？
RQ4该方法对环境扰动和类课程的任务变化是否具有鲁棒性？

主要发现

渐进的第二列在真实机器人上的性能（34 点）高于微调列或从零开始基线。
随机初始化的列在真实机器人上无法学习，显示出迁移支架的必要性。
在环境变化下，渐进网络表现出比微调更高的稳定性和更高的最终性能。
通过新增列来整合本体感知输入，同时通过横向连接重用视觉特征，从而在动态任务的迁移中实现改进。
通过渐进网络的迁移，利用仿真训练的特征，从从零开始的真实机器人训练中减少所需时间。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。