Skip to main content
QUICK REVIEW

[논문 리뷰] Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction

Huangying Zhan, Ravi Garg|arXiv (Cornell University)|2018. 03. 11.
Advanced Vision and Imaging참고 문헌 33인용 수 85
한 줄 요약

본 논문은 스테레오 비디오 시퀀스를 이용해 단일 뷰 깊이와 단안 시각추정( monocular visual odometry) 를 공동으로 학습하는 비지도 프레임워크를 제시한다. 공간적 및 시간적 광도 손실과 딥 피처 재구성 손실을 사용하여 메트릭 스케일의 깊이와 경쟁력 있는 VO를 달성한다.

ABSTRACT

Despite learning based methods showing promising results in single view depth estimation and visual odometry, most existing approaches treat the tasks in a supervised manner. Recent approaches to single view depth estimation explore the possibility of learning without full supervision via minimizing photometric error. In this paper, we explore the use of stereo sequences for learning depth and visual odometry. The use of stereo sequences enables the use of both spatial (between left-right pairs) and temporal (forward backward) photometric warp error, and constrains the scene depth and camera motion to be in a common, real-world scale. At test time our framework is able to estimate single view depth and two-view odometry from a monocular sequence. We also show how we can improve on a standard photometric warp loss by considering a warp of deep features. We show through extensive experiments that: (i) jointly training for single view depth and visual odometry improves depth prediction because of the additional constraint imposed on depths and achieves competitive results for visual odometry; (ii) deep feature-based warping loss improves upon simple photometric warp loss for both single view depth estimation and visual odometry. Our method outperforms existing learning based methods on the KITTI driving dataset in both tasks. The source code is available at https://github.com/Huangying-Zhan/Depth-VO-Feat

연구 동기 및 목표

  • Motivate and address the scale ambiguity in monocular depth and pose estimation by leveraging stereo training data.
  • Jointly learn a depth estimator and a visual odometry network to enforce cross-task consistency.
  • Improve supervision beyond photometric loss by introducing a deep feature reconstruction loss.
  • Show that stereo and temporal constraints improve depth accuracy and VO performance on KITTI.

제안 방법

  • Train depth (CNN_D) and visual odometry (CNN_VO) networks jointly from stereo video sequences.
  • Use differentiable geometry to synthesize target views via epipolar geometry and bilinear warping, enabling image reconstruction losses.
  • Impose an image reconstruction loss combining left-right and temporal consistency for supervision.
  • Introduce a deep feature reconstruction loss to provide robust, context-aware supervision beyond raw pixel intensities.
  • Apply an edge-aware depth smoothness loss to regularize depth predictions.
  • Optionally fuse features from ImageNet, NYUv2-descriptor, or self-embedded depth features within the feature reconstruction term.

실험 결과

연구 질문

  • RQ1Can stereo training remove per-frame scale ambiguity in monocular depth and VO estimation and provide metric scale at test time?
  • RQ2Does incorporating temporal information and deep feature-based reconstruction improve depth and VO accuracy beyond color-based photometric losses?
  • RQ3What is the impact of jointly training depth and pose networks on depth quality and frame-to-frame odometry performance on KITTI?

주요 결과

  • The stereo-based, joint training framework yields metric-scale depth and competitive monocular VO results without external scale supervision.
  • Deep feature reconstruction loss improves depth and VO accuracy over pure photometric (color) warp loss.
  • Joint depth and VO training with stereo and temporal constraints outperforms prior monocular VO methods and remains competitive with geometric baselines on KITTI.
  • Incorporating learned features (from ImageNet or self-supervised depth features) in the warp loss further boosts performance.
  • The approach achieves state-of-the-art results among unsupervised methods on KITTI for both single-view depth estimation and frame-to-frame VO.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.