QUICK REVIEW

[论文解读] Bayesian Domain Randomization for Sim-to-Real Transfer.

Fabio Muratore, Christian Eilers|arXiv (Cornell University)|Mar 5, 2020

Reinforcement Learning in Robotics参考文献 13被引用 11

一句话总结

BayRn 提出了一种贝叶斯领域随机化方法，这是一种模拟到现实的迁移方法，通过使用贝叶斯优化从真实世界目标领域采样，自适应地在训练过程中学习领域参数分布。该方法减少了对先验知识的依赖，并实现了对真实机器人直接、稳健的策略迁移，在非线性摆动上翻任务中优于固定分布的领域随机化。

ABSTRACT

When learning policies for robot control, the real-world data required is typically prohibitively expensive to acquire, so learning in simulation is a popular strategy. Unfortunately, such polices are often not transferable to the real world due to a mismatch between the simulation and reality, called 'reality gap'. Domain randomization methods tackle this problem by randomizing the physics simulator (source domain) according to a distribution over domain parameters during training in order to obtain more robust policies that are able to overcome the reality gap. Most domain randomization approaches sample the domain parameters from a fixed distribution. This solution is suboptimal in the context of sim-to-real transferability, since it yields policies that have been trained without explicitly optimizing for the reward on the real system (target domain). Additionally, a fixed distribution assumes there is prior knowledge about the uncertainty over the domain parameters. Thus, we propose Bayesian Domain Randomization (BayRn), a black box sim-to-real algorithm that solves tasks efficiently by adapting the domain parameter distribution during learning by sampling the real-world target domain. BayRn utilizes Bayesian optimization to search the space of source domain distribution parameters which produce a policy that maximizes the real-word objective, allowing for adaptive distributions during policy optimization. We experimentally validate the proposed approach by comparing against two baseline methods on a nonlinear under-actuated swing-up task. Our results show that BayRn is capable to perform direct sim-to-real transfer, while significantly reducing the required prior knowledge.

研究动机与目标

解决因‘现实差距’导致仿真策略在现实中失效的模拟到现实策略迁移问题。
克服固定分布领域随机化方法的局限性，后者假设对领域参数不确定性的先验知识。
通过在训练期间自适应调整源领域分布以最大化现实世界性能，实现高效的模拟到现实迁移。
通过与真实系统交互学习最优分布，减少对专家标注的领域参数先验依赖。
开发一种黑箱、端到端的方法，以现实世界奖励为优化目标，无需显式建模现实差距。

提出的方法

BayRn 使用贝叶斯优化在策略训练期间搜索源领域分布的参数。
在训练过程中，领域参数（例如质量、摩擦力、重力）从一个基于现实世界性能反馈自适应更新的分布中采样。
该算法将分布参数视为需优化的超参数，以现实世界奖励作为目标函数。
它采用概率代理模型（例如高斯过程）来建模分布参数与现实世界策略性能之间的关系。
该方法以黑箱方式运行，无需对策略网络或模拟器进行架构修改。
使用真实世界的滚动试验评估策略性能，并指导领域分布的优化，从而最大限度减少对先验假设的依赖。

实验结果

研究问题

RQ1与固定分布领域随机化相比，自适应领域分布学习是否能提升模拟到现实的策略迁移效果？
RQ2BayRn 在多大程度上减少了对模拟到现实迁移中领域参数不确定性先验知识的需求？
RQ3BayRn 在无需微调的情况下，对直接将策略迁移到真实世界机器人系统有多有效？
RQ4对领域分布进行贝叶斯优化是否能带来比基线方法更快的收敛速度和更好的现实世界性能？

主要发现

BayRn 在无需任何真实世界微调的情况下，成功实现了在非线性欠驱动摆动上翻任务上的直接模拟到现实迁移。
与标准领域随机化相比，该方法显著减少了对领域参数分布先验知识的依赖。
BayRn 在任务成功率等指标上实现了比基线领域随机化方法更高的现实世界性能。
通过贝叶斯优化实现的自适应分布学习，带来了更快的收敛速度和更鲁棒的现实世界策略。
该方法表明，真实世界反馈能够有效引导源领域分布的学习，从而提升迁移能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。