QUICK REVIEW

[论文解读] Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer

René Ranftl, Katrin Lasinger|arXiv (Cornell University)|Jul 2, 2019

Advanced Vision and Imaging被引用 44

一句话总结

作者开发了损失函数和训练策略，将多样化的深度数据集混合起来，实现单目深度估计的零样本跨数据集迁移，在多个数据集上达到最先进水平。

ABSTRACT

The success of monocular depth estimation relies on large and diverse training sets. Due to the challenges associated with acquiring dense ground-truth depth across different environments at scale, a number of datasets with distinct characteristics and biases have emerged. We develop tools that enable mixing multiple datasets during training, even if their annotations are incompatible. In particular, we propose a robust training objective that is invariant to changes in depth range and scale, advocate the use of principled multi-objective learning to combine data from different sources, and highlight the importance of pretraining encoders on auxiliary tasks. Armed with these tools, we experiment with five diverse training datasets, including a new, massive data source: 3D films. To demonstrate the generalization power of our approach we use zero-shot cross-dataset transfer}, i.e. we evaluate on datasets that were not seen during training. The experiments confirm that mixing data from complementary sources greatly improves monocular depth estimation. Our approach clearly outperforms competing methods across diverse datasets, setting a new state of the art for monocular depth estimation. Some results are shown in the supplementary video at https://youtu.be/D46FzVyL9I8

研究动机与目标

通过利用多个带偏差的数据集来提升在多样环境中的鲁棒单目深度估计。
开发对数据集之间的尺度和基线差异不变的训练目标。
提出原理性的多目标数据混合策略，以结合来自多样来源的数据。
强调高容量编码器及其预训练对性能提升的重要性。

提出的方法

在视差空间中进行预测，以处理跨数据集的未知尺度和偏移。
引入尺度不变和偏移不变的损失（Lssi），同时包括最小二乘和鲁棒变体（Lssimse、Lssimae、Lssitrim）。
提供对齐策略，在损失计算过程中求解尺度和平移（s, t）。
加入梯度正则化项（Lreg），以锐化深度不连续性并与真实边缘对齐。
比较训练的朴素多数据集混合策略与帕累托最优的策略（等分分区 vs. 多目标优化）。
评估编码器架构及预训练（ImageNet、WS-augmented）以评估对跨数据集迁移的影响。

实验结果

研究问题

RQ1将多个带偏差的深度数据集混合是否能提高对未见数据集的泛化能力（零样本迁移）？
RQ2在训练过程中应如何处理数据集之间的尺度和基线不一致？
RQ3在单目深度估计中，多目标（帕累托）数据混合策略是否优于朴素混合？
RQ4编码器容量和预训练对跨数据集迁移性能有何影响？
RQ5在尺度和偏移不变损失的前提下，在视差空间预测在不同数据源上是否数值稳定且有效？

主要发现

混合互补数据集在零样本跨数据集迁移中显著提升单目深度估计。
视差空间中的尺度不变和偏移不变损失优于以前的损失，包括组合变体（例如 Lssitrim + Lreg）。
在 ImageNet 上预训练的高容量编码器（特别是 ResNeXt-101-WSL）带来显著的性能提升。
在大型辅助任务上的预训练对获得强大性能至关重要。
帕累托最优的多任务数据混合相较于朴素等数据集混合带来收益。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。