QUICK REVIEW

[论文解读] GRIMGEP: Learning Progress for Robust Goal Sampling in Visual Deep Reinforcement Learning

Grgur Kovač, Adrien Laversanne-Finot|arXiv (Cornell University)|Aug 10, 2020

Reinforcement Learning in Robotics参考文献 34被引用 7

一句话总结

GRIMGEP 提出了一种新颖的框架，将学习进度（ALP）与基于新颖性的探索相结合，应用于视觉深度强化学习，通过聚类引导目标采样。通过优先关注学习进度高的区域，并在这些聚类内应用新颖性搜索，GRIMGEP 有效减少了对噪声大、难以学习的目标（如闪烁的电视）的干扰，显著提升了复杂 3D 图像环境中的样本效率与最终性能。

ABSTRACT

Designing agents, capable of learning autonomously a wide range of skills is critical in order to increase the scope of reinforcement learning. It will both increase the diversity of learned skills and reduce the burden of manually designing reward functions for each skill. Self-supervised agents, setting their own goals, and trying to maximize the diversity of those goals have shown great promise towards this end. However, a currently known limitation of agents trying to maximize the diversity of sampled goals is that they tend to get attracted to noise or more generally to parts of the environments that cannot be controlled (distractors). When agents have access to predefined goal features or expert knowledge, absolute Learning Progress (ALP) provides a way to distinguish between regions that can be controlled and those that cannot. However, those methods often fall short when the agents are only provided with raw sensory inputs such as images. In this work we extend those concepts to unsupervised image-based goal exploration. We propose a framework that allows agents to autonomously identify and ignore noisy distracting regions while searching for novelty in the learnable regions to both improve overall performance and avoid catastrophic forgetting. Our framework can be combined with any state-of-the-art novelty seeking goal exploration approaches. We construct a rich 3D image based environment with distractors. Experiments on this environment show that agents using our framework successfully identify interesting regions of the environment, resulting in drastically improved performances. The source code is available at https://sites.google.com/view/grimgep.

研究动机与目标

解决在存在干扰物的复杂、基于图像的环境中，基于新颖性的方法失效时的目标探索挑战。
将基于学习进度（ALP）的课程学习方法拓展至视觉 DRL，克服高维观测带来的直接应用困难。
通过基于 ALP 的高层课程引导，提升基于新颖性的探索算法的鲁棒性与样本效率。
证明结合 ALP 驱动的区域选择与基于新颖性的目标采样，相比独立方法能取得更优性能。

提出的方法

GRIMGEP 使用聚类 VAE 将视觉观测空间根据潜在表征划分为多个区域。
在每个聚类内估计学习进度（LP），以识别代理当前学习最活跃的区域。
框架选择学习进度高的聚类进行目标采样，确保聚焦于可学习、信息量高的任务。
在每个选定聚类内，应用基于新颖性的探索（Skewfit 或 CountBased）采样目标，确保在有前景区域内的多样性。
该方法通过在线训练聚类 VAE 和基于高斯混合模型（GMM）的聚类选择实现，超参数通过 AIC 进行调优。
可无缝集成至现有 IMGEP 框架中，作为引导探索方向至环境相关、可学习区域的先验机制。

实验结果

研究问题

RQ1当前基于新颖性的目标探索算法（Skewfit、CountBased）在存在动作诱导干扰物（如闪烁的电视）时表现如何？
RQ2基于学习进度（ALP）的课程是否能提升基于新颖性的探索在基于图像的 DRL 中的鲁棒性与性能？
RQ3GRIMGEP 的 ALP 引导聚类选择机制与均匀聚类采样相比，在探索效率与最终性能方面表现如何？
RQ4ALP 与新颖性搜索的结合在多大程度上减少了对不可学习、具有干扰性的目标的吸引？

主要发现

GRIMGEP 显著降低了从富含干扰物的电视房间采样目标的比例，将注意力转向可学习有意义技能的物体房间。
GRIM-Skewfit 与 GRIM-CountBased 在最终性能上显著优于其未封装的基线方法，例如在物体房间目标上达到 80% 的成功率，而基线方法则大多被电视干扰。
在 GRIMGEP 内部使用 OnlineRIG（均匀采样）时，性能有所提升但仍不理想，表明仅靠 ALP 引导不足以实现最优性能，必须结合内在探索激励。
消融实验确认，基于 ALP 的聚类采样优于均匀聚类采样，前者在成功率上显著更高，且对物体房间的探索更一致。
该框架成功检测并优先处理环境中相关且可学习的区域，即使在缺乏专家知识或密集奖励设计的情况下亦成立。
GRIMGEP 使智能体能够自主构建复杂且自适应的课程，避免不可学习的任务，专注于学习进度最高的区域。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。