QUICK REVIEW

[论文解读] DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data

Wei Yin, Xinlong Wang|arXiv (Cornell University)|Feb 3, 2020

Advanced Vision and Imaging参考文献 30被引用 34

一句话总结

DiverseDepth 从一个庞大且多样化的数据集中学习仿射不变深度，并使用多阶段课程训练策略，在单目深度预测中实现强零-shot泛化，同时保持几何场景结构。

ABSTRACT

We present a method for depth estimation with monocular images, which can predict high-quality depth on diverse scenes up to an affine transformation, thus preserving accurate shapes of a scene. Previous methods that predict metric depth often work well only for a specific scene. In contrast, learning relative depth (information of being closer or further) can enjoy better generalization, with the price of failing to recover the accurate geometric shape of the scene. In this work, we propose a dataset and methods to tackle this dilemma, aiming to predict accurate depth up to an affine transformation with good generalization to diverse scenes. First we construct a large-scale and diverse dataset, termed Diverse Scene Depth dataset (DiverseDepth), which has a broad range of scenes and foreground contents. Compared with previous learning objectives, i.e., learning metric depth or relative depth, we propose to learn the affine-invariant depth using our diverse dataset to ensure both generalization and high-quality geometric shapes of scenes. Furthermore, in order to train the model on the complex dataset effectively, we propose a multi-curriculum learning method. Experiments show that our method outperforms previous methods on 8 datasets by a large margin with the zero-shot test setting, demonstrating the excellent generalization capacity of the learned model to diverse scenes. The reconstructed point clouds with the predicted depth show that our method can recover high-quality 3D shapes. Code and dataset are available at: https://tinyurl.com/DiverseDepth

研究动机与目标

促进在多样场景间具有泛化能力的深度估计，同时保持准确的3D几何。
构建一个大规模、多样化的 RGB-D 数据集（DiverseDepth），覆盖刚性/非刚性内容以及室内/室外场景。
提出仿射不变深度预测，将尺度/平移从深度中分离，提升泛化能力。
开发一种多课程学习方案，以在复杂、多样的数据上进行有效训练。

提出的方法

引入 DiverseDepth 数据集，分为三个部分：Part-fore（前景），Part-in（室内背景），Part-out（室外背景）。
通过对真实与虚拟相机系统之间的尺度与平移进行解耦，提出仿射不变深度预测。
使用包含高阶几何约束（虚拟法线、表面法线）以及尺度-平移不变损失（SSIL）的损失函数。
采用多课程学习（MCL）策略，将数据按难度排序，并从三个数据部分以由易到难的小批量进行训练。
在八个数据集上进行零-shot测试进行评估，使用 Abs-Rel 和 WHDR 指标，在将预测的仿射不变深度重新缩放为度量深度后。

实验结果

研究问题

RQ1在多样数据集上学习的仿射不变深度是否比度量深度或相对深度方法对未见场景具有更好的泛化性？
RQ2将大规模、多样化的训练语料库与结构化课程学习相结合，是否能提升跨领域深度预测质量？
RQ3将 VNL/SSIL 损失与仿射不变性结合，对3D形状重建的影响？
RQ4与背景场景相比，所提方法在前景对象（如人）上的表现如何？

主要发现

在8个零-shot数据集上，优于现有的度量深度和相对深度方法，提到相对提升幅度最高可达70%。
在 NYU 数据集上，的表现与专门在 NYU 上训练的方法相竞争（11.7% Abs-Rel 对比竞争方法的 12.3%）。
该方法产生更高质量的三维重建，保留场景几何比相对深度基线更好。
消融实验表明，多课程学习在泛化能力上显著优于均匀采样和反课程变体。
损失分析表明 VNL 和 SSIL 在多样数据集上对仿射不变深度优于其他损失。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。