QUICK REVIEW

[论文解读] Active Domain Randomization

Bhairav Mehta, Manfred Diaz|arXiv (Cornell University)|Apr 9, 2019

Domain Adaptation and Few-Shot Learning参考文献 32被引用 33

一句话总结

主动域随机化（ADR）学习一种参数采样策略，将训练聚焦于最具信息量的环境变量，从而提高对比 Uniform Domain Randomization（UDR）的泛化性和鲁棒性。

ABSTRACT

Domain randomization is a popular technique for improving domain transfer, often used in a zero-shot setting when the target domain is unknown or cannot easily be used for training. In this work, we empirically examine the effects of domain randomization on agent generalization. Our experiments show that domain randomization may lead to suboptimal, high-variance policies, which we attribute to the uniform sampling of environment parameters. We propose Active Domain Randomization, a novel algorithm that learns a parameter sampling strategy. Our method looks for the most informative environment variations within the given randomization ranges by leveraging the discrepancies of policy rollouts in randomized and reference environment instances. We find that training more frequently on these instances leads to better overall agent generalization. Our experiments across various physics-based simulated and real-robot tasks show that this enhancement leads to more robust, consistent policies.

研究动机与目标

研究为什么均匀随机化会导致高方差、次优策略。
提出 ADR，在训练期间学习有信息量的环境变量变化。
展示 ADR 在仿真与真实机器人任务以及高维参数空间中的有效性。

提出的方法

将 DR（领域随机化）表述为一个强化学习问题，采样策略通过 Stein Variational Policy Gradient (SVPG) 进行优化。
使用判别器衡量参考环境与随机化环境中轨迹之间的差异，以提供学习信号。
训练一个 SVPG 粒子集合，提出多样且信息量大的随机化环境。
在由 SVPG 粒子提出的环境上更新智能体策略，同时更新判别器以引导采样。
在多个环境中应用 ADR，并展示更强的泛化和鲁棒性。

实验结果

研究问题

RQ1与定向采样相比，均匀采样随机化参数是否会导致次优的泛化？
RQ2ADR 是否能在不同任务和随机化空间的维数范围内改善泛化并降低策略方差？
RQ3基于判别器的奖励是否有效地引导采样策略朝向更有信息量的环境？
RQ4使用 ADR 训练的策略在仿真到现实的迁移中是否比使用 UDR 训练的策略更鲁棒？

主要发现

在 UDR 失败的困难场景中，ADR 可以达到或接近专家级的泛化。
ADR 产生的策略在不同环境中的方差更低、表现更一致。
ADR 能扩展到高维随机化空间，并在不需要目标域奖励的情况下改善仿真到现实的迁移。
对有问题的环境进行更频繁的训练比均匀采样能获得更好的总体泛化。
ADR 通过突出哪些环境区域较难且需要更多训练来提供可解释性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。