QUICK REVIEW

[Paper Review] Benchmarking Classic and Learned Navigation in Complex 3D Environments

Dmytro Mishkin, Alexey Dosovitskiy|arXiv (Cornell University)|Jan 30, 2019

Robotics and Sensor-Based Localization50 references41 citations

TL;DR

The paper compares a classic modular navigation pipeline against a learned agent and human performance across varied indoor 3D environments, finding that RGB-D–equipped classic navigation often outperforms learning-based methods, while learned navigation is more robust with limited sensory input; humans still outperform both.

ABSTRACT

Navigation research is attracting renewed interest with the advent of learning-based methods. However, this new line of work is largely disconnected from well-established classic navigation approaches. In this paper, we take a step towards coordinating these two directions of research. We set up classic and learning-based navigation systems in common simulated environments and thoroughly evaluate them in indoor spaces of varying complexity, with access to different sensory modalities. Additionally, we measure human performance in the same environments. We find that a classic pipeline, when properly tuned, can perform very well in complex cluttered environments. On the other hand, learned systems can operate more robustly with a limited sensor suite. Overall, both approaches are still far from human-level performance.

Motivation & Objective

Assess how classic modular navigation and end-to-end learned navigation perform in cluttered indoor 3D environments.
Evaluate robustness of each approach under different sensor modalities (none, RGB, RGB-D).
Quantify human navigation performance in the same environments for benchmarking.
Investigate whether hybrid (classic+learned) approaches could leverage strengths of both paradigms.

Proposed method

Implement a classic modular navigation pipeline (mapping, localization, planning, locomotion) using ORB-SLAM2 for localization and a D* Lite planner.
Compare with an end-to-end learned agent based on Direct Future Prediction (DFP) and its Belief DFP variant for interpretability.
Evaluate in MINOS simulator across SunCG (Empty and Furnished) and Matterport3D environments with RGB, RGB-D, and other sensor inputs.
Provide ground-truth pose and maps where available to analyze performance under different information regimes.
Measure performance with metrics such as SPL, success rate, and pace, and compare to human performance.
Experiment with depth estimation methods (monocular and stereo) to augment RGB inputs for the classic pipeline.

Experimental results

Research questions

RQ1How do classic modular navigation pipelines compare to learned navigation agents in terms of success and efficiency across cluttered 3D environments?
RQ2How does sensor modality (RGB vs RGB-D) affect the robustness and performance of each approach?
RQ3To what extent can depth estimation from RGB improve classic SLAM-based navigation?
RQ4How close do artificial navigation systems come to human performance under similar tasks?

Key findings

A classic pipeline with RGB-D input generally outperforms learned approaches in cluttered environments.
The learned agent performs better with RGB input only than the classic RGB baseline, indicating robustness to reduced sensory information.
Depth information dramatically improves classic navigation performance, but RGB-only SLAM is prone to localization failures.
RGB-D inputs plus pose/map information further improve classic navigation, though depth estimation from RGB can partially recover performance.
Humans outperform both artificial approaches across all environments and metrics, highlighting remaining gaps in autonomous navigation.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.