QUICK REVIEW

[论文解读] The StreetLearn Environment and Dataset

Piotr Mirowski, Andras Banki-Horvath|arXiv (Cornell University)|Mar 4, 2019

Multimodal Machine Learning Applications参考文献 26被引用 49

一句话总结

该论文介绍 StreetLearn，这是一个使用 Google Street View 内容的交互式第一人称导航环境，并为跨多个城市区域的快递员导航任务提供基线。它还发布了用于端到端视觉导航的代码和可扩展评估框架。

ABSTRACT

Navigation is a rich and well-grounded problem domain that drives progress in many different areas of research: perception, planning, memory, exploration, and optimisation in particular. Historically these challenges have been separately considered and solutions built that rely on stationary datasets - for example, recorded trajectories through an environment. These datasets cannot be used for decision-making and reinforcement learning, however, and in general the perspective of navigation as an interactive learning task, where the actions and behaviours of a learning agent are learned simultaneously with the perception and planning, is relatively unsupported. Thus, existing navigation benchmarks generally rely on static datasets (Geiger et al., 2013; Kendall et al., 2015) or simulators (Beattie et al., 2016; Shah et al., 2018). To support and validate research in end-to-end navigation, we present StreetLearn: an interactive, first-person, partially-observed visual environment that uses Google Street View for its photographic content and broad coverage, and give performance baselines for a challenging goal-driven navigation task. The environment code, baseline agent code, and the dataset are available at http://streetlearn.cc

研究动机与目标

在现实世界类环境中推动端到端视觉导航，超越静态数据集。
将 StreetLearn 作为一个基于 Google Street View 图像的交互式第一人称导航环境。
定义送货风格的快递任务和指令跟随任务，以探测导航策略。
提供一个可扩展的基准，具有区域评估、课程设置和可转移智能体架构。

提出的方法

从 Google Street View 全景图构建 StreetLearn，形成两座城市（New York City 和 Pittsburgh）的真实世界街道图。
将观测空间定义为 84x84 RGB 裁剪，以及用于旋转、移动和缩放的五个动作离散/六个离散动作集合。
将任务形式化为包括快递目标跟随任务和基于指令的导航，目标可为绝对坐标或语言引导指令。
提出两种神经架构（CityNav 和 MultiCityNav），具有共享编码器和城市特定的 LSTM，通过 IMPALA 进行可扩展的 RL 训练。
通过在街道图上进行最短路径 BFS 来提供一个 oracle 基线，以界定性能上界。
发布包含 C++ 引擎、协议缓冲区、Python gym-like 接口和 TensorFlow agents 的代码库。

实验结果

研究问题

RQ1端到端导航策略是否能够直接从现实世界类 Street View 图中通过视觉输入学习？
RQ2区域特定和多城市架构在不同城市区域的泛化与迁移能力如何？
RQ3课程学习和目标表示对长距离导航性能有何影响？
RQ4模仿/真实世界指导（oracle）对已学习策略的性能有何界限？
RQ5目标规格（绝对经纬度 vs. 地标）是否影响导航效果？

主要发现

城市	Oracle	单一	联合	迁移
Wall Street	809	782	745	541
Union Square	750	721	681	667
Hudson River	721	615	621	601
CMU	755	473	313	355
Allegheny	760	669	571	562
South Shore	737	1	-	-

在纽约区域，按区域训练的智能体达到 oracle 回报的 85%-97%。
在匹兹堡区域（尤其是 South Shore）表现下降，原因包括海拔和道路拓扑影响课程设计。
在多个区域和城市的联合训练相比区域特定训练，仅有较小的性能下降。
迁移实验表明冻结编码器/策略组件，仅更新目标 LSTM 即可在新区域实现中等损失的迁移。
Oracle（最短路径）为每个区域可实现性能的上限。
在至少一个区域（Union Square）中，经纬度目标表示优于基于地标的目标。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。