QUICK REVIEW

[论文解读] Learning Deployable Navigation Policies at Kilometer Scale from a Single Traversal

Jake Bruce, Niko Sünderhauf|arXiv (Cornell University)|Jul 11, 2018

Robotic Path Planning Algorithms参考文献 32被引用 24

一句话总结

本文提出一种仅通过一次真实世界数据覆盖遍历，即可训练可部署的、目标导向的导航策略的方法。通过预计算视觉嵌入，并在特征空间中应用高效的随机数据增强，该方法可在普通台式机上实现每秒超过20,000个转移的训练速度，成功实现零样本部署于真实机器人，在2公里异构环境中无需微调即可完成导航。

ABSTRACT

Model-free reinforcement learning has recently been shown to be effective at learning navigation policies from complex image input. However, these algorithms tend to require large amounts of interaction with the environment, which can be prohibitively costly to obtain on robots in the real world. We present an approach for efficiently learning goal-directed navigation policies on a mobile robot, from only a single coverage traversal of recorded data. The navigation agent learns an effective policy over a diverse action space in a large heterogeneous environment consisting of more than 2km of travel, through buildings and outdoor regions that collectively exhibit large variations in visual appearance, self-similarity, and connectivity. We compare pretrained visual encoders that enable precomputation of visual embeddings to achieve a throughput of tens of thousands of transitions per second at training time on a commodity desktop computer, allowing agents to learn from millions of trajectories of experience in a matter of hours. We propose multiple forms of computationally efficient stochastic augmentation to enable the learned policy to generalise beyond these precomputed embeddings, and demonstrate successful deployment of the learned policy on the real robot without fine tuning, despite environmental appearance differences at test time. The dataset and code required to reproduce these results and apply the technique to other datasets and robots is made publicly available at rl-navigation.github.io/deployable.

研究动机与目标

减少在大型复杂环境中训练导航策略所需的真实世界数据量。
仅使用一次真实机器人记录的遍历，实现高效、高吞吐量的强化学习训练。
即使在训练与测试条件之间存在视觉和视角差异的情况下，仍能实现训练策略在真实机器人上的零样本部署。
公开发布数据集和代码，以确保可复现性并支持更广泛的应用。

提出的方法

使用预训练的图像编码器，从一次机器人遍历中预计算视觉嵌入，从而实现快速推理和高吞吐量训练。
在特征空间中实时应用随机增强，包括随机帧选择、图像旋转和噪声注入，以模拟多样的视觉条件。
采用课程学习策略，确保训练过程中导航图的覆盖保持均衡。
使用基于到目标距离的密集奖励塑造方法，通过A3C训练无模型强化学习智能体。
利用以1米为节点间距的图结构表示环境，定义导航状态。
对全局和局部特征分别施加相关和不相关的噪声，以模拟感知差异并提升鲁棒性。

实验结果

研究问题

RQ1是否能够仅通过一次真实世界遍历成功训练导航策略，而无需大量真实世界交互？
RQ2是否能够仅使用预计算的视觉特征和高效的数据增强，在普通硬件上实现高吞吐量训练？
RQ3在训练与测试条件存在视觉和视角差异的情况下，策略是否仍能实现无需微调的真实世界部署？
RQ4与虚拟智能体相比，部署策略在路径效率和成功率方面的表现如何？

主要发现

在普通台式机上训练期间，该方法实现了每秒超过20,000个转移，使数百万次训练转移可在数小时内完成。
训练好的策略成功导航了2公里长的异构环境（室内外混合），在真实机器人上无需任何微调即可抵达目标图像。
部署轨迹的平均长度是最优路径的2.42倍，但与虚拟智能体相比处于合理范围内（1.14倍）。
该策略在未见过的视觉条件下（包括光照变化、阴影和视角变化）表现出有效泛化能力，通过真实世界测试场景中的成功导航得到验证。
该方法在真实世界可扩展性方面优于先前工作，实现了仅通过一次遍历完成千米级导航。
数据集和代码已公开发布于 rl-navigation.github.io/deployable，以支持可复现性和再利用。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。