QUICK REVIEW

[论文解读] Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies

Alexander F. Sax, Bradley Emi|arXiv (Cornell University)|Dec 31, 2018

Visual perception and processing mechanisms参考文献 82被引用 49

一句话总结

本文表明冻结一组中层视觉特征可提升基于强化学习的 visuomotor 策略的样本效率和泛化能力，并提出一个最大覆盖特征选择器，以获得紧凑、任务包容的特征集。

ABSTRACT

How much does having visual priors about the world (e.g. the fact that the world is 3D) assist in learning to perform downstream motor tasks (e.g. delivering a package)? We study this question by integrating a generic perceptual skill set (e.g. a distance estimator, an edge detector, etc.) within a reinforcement learning framework--see Figure 1. This skill set (hereafter mid-level perception) provides the policy with a more processed state of the world compared to raw images. We find that using a mid-level perception confers significant advantages over training end-to-end from scratch (i.e. not leveraging priors) in navigation-oriented tasks. Agents are able to generalize to situations where the from-scratch approach fails and training becomes significantly more sample efficient. However, we show that realizing these gains requires careful selection of the mid-level perceptual skills. Therefore, we refine our findings into an efficient max-coverage feature set that can be adopted in lieu of raw images. We perform our study in completely separate buildings for training and testing and compare against visually blind baseline policies and state-of-the-art feature learning methods.

研究动机与目标

评估中层视觉特征是否提升基于 RL 的 visuomotor 任务的样本效率。
评估基于特征的策略对未知环境的泛化。
确定单一固定特征是否足以覆盖多个任务，还是需要一组特征。

提出的方法

冻结并重用预训练的中层视觉编码器，在 RL 策略输入前将原始观测转换。
在带特征增强观测的情况下，使用带离策略校正的 PPO 进行策略训练。
在 Gibson 环境中，对导航、探索和规划任务评估 20 个中层特征，使用在不同建筑中的训练/测试拆分。
以相对奖励相对于盲基线来量化表现，以考量任务难度。
提出一个最大覆盖特征选择器，以选择最小化最坏情形转移距离的紧凑特征子集。

实验结果

研究问题

RQ1与从头学习相比，中层视觉特征是否能加速学习（样本效率）？
RQ2中层特征是否提升对未知环境的泛化？
RQ3单一固定特征是否足以覆盖所有下游 visuomotor 任务，还是需要多样的特征集？
RQ4在减少数据和计算的同时，紧凑的特征子集是否能维持性能？

主要发现

中层特征在所测试的任务中相较于从零开始的策略实现了更快的学习。
若干基于特征的代理在未见测试环境中的最终性能超过从零开始训练的策略。
排名反转表明不存在通用特征；最佳特征取决于下游任务（导航使用语义特征，探索使用几何特征）。
最大覆盖特征选择器可以产生紧凑的特征集，接近或超过最佳任务特定特征，同时使用更少的数据。
该特征集在多个建筑和第二个仿真器（VizDoom）中也具有泛化性，支持该方法在多样设置下的普适性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。