QUICK REVIEW

[논문 리뷰] Learning to Explore using Active Neural SLAM

Devendra Singh Chaplot, Dhiraj Gandhi|arXiv (Cornell University)|2020. 04. 10.

Robot Manipulation and Learning인용 수 220

한 줄 요약

Active Neural SLAM은 학습된 Neural SLAM 모듈, Global 정책, Local 정책으로 구성된 모듈형의 계층적 탐색 시스템을 구축하여 최첨단 탐험 성능을 달성하고 PointGoal 태스크로의 성공적인 전이를 달성한다.

ABSTRACT

This work presents a modular and hierarchical approach to learn policies for exploring 3D environments, called `Active Neural SLAM'. Our approach leverages the strengths of both classical and learning-based methods, by using analytical path planners with learned SLAM module, and global and local policies. The use of learning provides flexibility with respect to input modalities (in the SLAM module), leverages structural regularities of the world (in global policies), and provides robustness to errors in state estimation (in local policies). Such use of learning within each module retains its benefits, while at the same time, hierarchical decomposition and modular training allow us to sidestep the high sample complexities associated with training end-to-end policies. Our experiments in visually and physically realistic simulated 3D environments demonstrate the effectiveness of our approach over past learning and geometry-based approaches. The proposed model can also be easily transferred to the PointGoal task and was the winning entry of the CVPR 2019 Habitat PointGoal Navigation Challenge.

연구 동기 및 목표

알 수 없는 3D 환경에서의 탐험 효율성과 상태 추정 오차에 대한 강건성을 촉진한다.
학습된 SLAM 모듈과 전통적 계획을 결합한 모듈형 아키텍처를 제안한다.
엔드-투-엔드 학습에 비해 샘플 복잡도를 줄이기 위해 계층적 의사결정을 활용한다.
PointGoal 태스크로의 전이 및 현실 세계 적용 가능성을 보여준다.

제안 방법

맵과 해석적 플래너를 통해 인터페이스되는 Neural SLAM 모듈, Global 정책, Local 정책의 세 구성요소 아키텍처를 도입한다.
Neural SLAM은 RGB 및 센서 데이터로부터 자가 중심 맵과 자세를 예측하는 Mapper와 Pose Estimator로 구성된다.
Global 정책은 맵과 자세를 사용해 장기 목표를 출력하며, 이는 Fast Marching Method를 사용하는 플래너에 의해 단기 목표로 변환된다.
Local 정책은 ResNet18 인코더를 갖춘 학습된 정책으로 RGB 관찰을 행동으로 매핑하여 단기 목표에 도달한다.
훈련은 모듈식이다: SLAM용 맵/자세 감독, Global 정책용 RL, Local 정책용 모방 학습을 통해 샘플 효율성을 가능하게 한다.

실험 결과

연구 질문

RQ1학습을 전통적인 내비게이션 파이프라인에 통합하여 탐험 효율성을 향상시킬 수 있는 방법은 무엇인가?
RQ2학습된 SLAM과 정책을 갖춘 모듈형 계층적 구성이 3D 탐험 작업에서 엔드투엔드 학습 기준선보다 성능이 우수한가?
RQ3이 접근법이 도메인 간에 일반화(예: Gibson에서 Matterport로)하고 재학습 없이 PointGoal 태스크로 전이할 수 있는가?
RQ4각 모듈(SLAM, Global 정책, Local 정책)이 성능 및 센서/구동 소음에 대한 강건성에 미치는 영향은 무엇인가?

주요 결과

Active Neural SLAM 모델은 Gibson과 MP3D 도메인 모두에서 탐험 지표에서 기준선보다 우수하다.
계층적 모듈식 설계가 엔드투엔드 기준선에 비해 탐색 공간을 줄이고 샘플 효율성을 향상시킨다.
이 방법은 강한 도메인 일반화 능력을 보이며 Gibson에서 학습된 정책을 Matterport로 전이시켜 커버리지가 개선된다.
추가 학습 없이 PointGoal 내비게이션으로 전이되며 CVPR 2019 Habitat PointGoal Navigation Challenge에서 우승한다.
절편 연구에서 Local Policy와 자세 추정 감독이 강건성과 장기 계획에 기여하는 바를 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.