Skip to main content
QUICK REVIEW

[論文レビュー] Learning Exploration Policies for Navigation

Tao Chen, Saurabh Gupta|arXiv (Cornell University)|Mar 5, 2019
Reinforcement Learning in Robotics参考文献 39被引用数 37
ひとこと要約

この論文は RGB-D 入力と搭載センサー報酬を用いて現実的な3D環境でのナビゲーションのタスク非依存の探索ポリシーを学習します。模倣学習でブートストラップし、カバレッジベースの内在報酬でファインチューニングすることで、幾何学のみのベースラインや好奇心ベースを上回り、ダウンストリームのナビゲーションタスクにも利益をもたらします。

ABSTRACT

Numerous past works have tackled the problem of task-driven navigation. But, how to effectively explore a new environment to enable a variety of down-stream tasks has received much less attention. In this work, we study how agents can autonomously explore realistic and complex 3D environments without the context of task-rewards. We propose a learning-based approach and investigate different policy architectures, reward functions, and training paradigms. We find that the use of policies with spatial memory that are bootstrapped with imitation learning and finally finetuned with coverage rewards derived purely from on-board sensors can be effective at exploring novel environments. We show that our learned exploration policies can explore better than classical approaches based on geometry alone and generic learning-based exploration techniques. Finally, we also show how such task-agnostic exploration can be used for down-stream tasks. Code and Videos are available at: https://sites.google.com/view/exploration-for-nav.

研究の動機と目的

  • 新規環境をナビゲートするための探索をタスク非依存の問題として動機づける。
  • 長期的な探索のために、RGB-Dデータと占有マップを空間メモリと組み合わせて活用するポリシーアーキテクチャを設計する。
  • オンボードセンサーに由来する intrinsic カバレッジベースの報酬と、衝突ペナルティを提案する。
  • サンプル効率を改善するため、模倣学習と強化学習を組み合わせたトレーニングパラダイムを調査する。
  • 未確認環境への一般化とダウンストリームのナビゲーションタスクへの有用性を実証する。

提案手法

  • Propose a recurrent policy  that processes egocentric occupancy maps and RGB input to drive exploration actions.
  • Construct an allocentric-to-egocentric map and fuse two map scales (40m x 40m and 4m x 4m) for CNN-based feature extraction.
  • Train with imitation learning from human exploration trajectories to bootstrap policy learning, then fine-tune with PPO on intrinsic rewards.
  • Define an intrinsic coverage reward based on the increase in map-covered area, combined with a collision penalty from a bump sensor.
  • Use RGB-D observations and a bump sensor in a House3D-based realistic environment to evaluate exploration and downstream navigation.
  • Compare against frontier-based geometric exploration and curiosity-based baselines to assess robustness to sensor/geometry affordance mismatches.

実験結果

リサーチクエスチョン

  • RQ1How can a task-agnostic exploration policy efficiently explore novel 3D environments using onboard sensors?
  • RQ2Does bootstrapping with human demonstrations plus intrinsic coverage rewards improve exploration quality and sample efficiency?
  • RQ3Can learned exploration policies generalize to unseen environments and aid downstream navigation tasks?

主な発見

  • Learning-based exploration with a spatial-memory policy and coverage rewards outperforms purely geometric baselines and curiosity-based exploration under sensor noise and geometry-affordance mismatches.
  • Imitation learning bootsraps performance and reduces variance, with further gains from RL fine-tuning.
  • Integrating RGB and map inputs improves exploration effectiveness compared to using RGB or maps alone.
  • Intrinsic coverage rewards derived from on-board sensor maps facilitate better exploration than extrinsic rewards crafted from environment features.
  • Exploration policies yield measurable benefits for downstream tasks such as localization of goal images and path planning in new environments.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。