[论文解读] Learning to Navigate in Cities Without a Map
本文提出 StreetLearn,这是一个基于 Google Street View 构建的城市尺度视觉导航环境,并引入一个双-path、目标条件强化学习架构,能够在多个城市中学习导航并迁移到新城市。
Navigating through unstructured environments is a basic capability of intelligent creatures, and thus is of fundamental interest in the study and development of artificial intelligence. Long-range navigation is a complex cognitive task that relies on developing an internal representation of space, grounded by recognisable landmarks and robust visual processing, that can simultaneously support continuous self-localisation ("I am here") and a representation of the goal ("I am going there"). Building upon recent research that applies deep reinforcement learning to maze navigation problems, we present an end-to-end deep reinforcement learning approach that can be applied on a city scale. Recognising that successful navigation relies on integration of general policies with locale-specific knowledge, we propose a dual pathway architecture that allows locale-specific features to be encapsulated, while still enabling transfer to multiple cities. We present an interactive navigation environment that uses Google StreetView for its photographic content and worldwide coverage, and demonstrate that our learning method allows agents to learn to navigate multiple cities and to traverse to target destinations that may be kilometres away. The project webpage http://streetlearn.cc contains a video summarising our research and showing the trained agent in diverse city environments and on the transfer task, the form to request the StreetLearn dataset and links to further resources. The StreetLearn environment code is available at https://github.com/deepmind/streetlearn
研究动机与目标
- Motivate end-to-end deep RL for navigation in real-world, city-scale environments without maps or GPS.
- Create a Street View-based interactive environment to study long-range, goal-directed navigation.
- Propose a modular, locale-specific neural architecture to balance general navigation policies with city-specific knowledge.
- Demonstrate learned navigation to distant goals in multiple cities and enable transfer to unseen regions.
提出的方法
- Use Street Learn, a Street View-based graph of panoramas as the navigation environment.
- Represent goals via proximity to a fixed set of landmarks using a softmax over distances to landmarks.
- Propose three architectures: GoalNav, CityNav, and MultiCityNav with dual LSTM pathways to separate locale-specific and general navigation knowledge.
- Train agents with IMPALA, using an auxiliary heading prediction task to aid learning.
- Employ curriculum learning that gradually increases goal distance from 500 m to up to 3.5–5 km depending on city region, with optional reward shaping.
- Provide a modular transfer protocol where a new city pathway is trained while keeping the shared encoder and policy LSTM fixed.
实验结果
研究问题
- RQ1Can end-to-end RL learn long-range, real-world navigation using only visual input from Street View?
- RQ2Does a dual-path architecture improve learning efficiency and transfer across multiple cities compared to single-city baselines?
- RQ3To what extent can a navigation agent trained on several cities transfer to a previously unseen city?
- RQ4How do curriculum learning and reward shaping affect learning speed and robustness in city-scale navigation?
主要发现
- CityNav with dual LSTM pathways achieves higher, more stable performance on the courier navigation task across New York, London, and Paris than the single-path GoalNav baseline.
- The DualPath architecture enables transfer to unseen city regions, and a new locale-specific pathway can be trained for transfer without re-learning shared components.
- Curriculum learning substantially improves learning efficiency and robustness over reward shaping alone, by gradually expanding reachable goal distances.
- On held-out goals, performance degrades as held-out area size increases, but agents still travel toward goals, indicating approximate landmark-based goal representation generalization.
- Transfer experiments show that pre-training on multiple cities and then transferring to a target city approaches joint multi-city training performance, with four-city pre-training yielding strong transfer results.
- Removing the skip connection between vision and policy harms single-city training but can regularize interfaces during multi-city transfer
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。