QUICK REVIEW

[论文解读] Monocular Depth Estimation: A Survey

Amlaan Bhoi|arXiv (Cornell University)|Jan 27, 2019

Advanced Vision and Imaging参考文献 16被引用 87

一句话总结

对五种单目深度估计方法（有监督、弱监督、无监督）及其多尺度、CRF 基于、序数回归方法的综述，并包含数据集评估和趋势分析。

ABSTRACT

Monocular depth estimation is often described as an ill-posed and inherently ambiguous problem. Estimating depth from 2D images is a crucial step in scene reconstruction, 3Dobject recognition, segmentation, and detection. The problem can be framed as: given a single RGB image as input, predict a dense depth map for each pixel. This problem is worsened by the fact that most scenes have large texture and structural variations, object occlusions, and rich geometric detailing. All these factors contribute to difficulty in accurate depth estimation. In this paper, we review five papers that attempt to solve the depth estimation problem with various techniques including supervised, weakly-supervised, and unsupervised learning techniques. We then compare these papers and understand the improvements made over one another. Finally, we explore potential improvements that can aid to better solve this problem.

研究动机与目标

将单目深度估计（MDE）动机化并定义为从单张 RGB 图像到密集像素级深度预测。
概述五种代表性的 MDE 方法，覆盖有监督、弱监督和无监督范式。
比较架构、损失函数和融合策略（多尺度特征、CRFs、注意力与序数回归）。
突出数据集、性能趋势以及潜在改进方向。

提出的方法

描述从早期多尺度深度网络到多尺度 CRFs 和级联 CRFs 的演变。
解释尺度不变损失及其如何降低尺度歧义。
概述用于多尺度融合和序列深度网络的连续 CRF 形式化。
总结用于特征融合的结构化注意力机制及基于 SID 的序数回归方法。
介绍使用立体对和图像重建损失的无监督左-右一致性训练。

实验结果

研究问题

RQ1在有监督、弱监督和无监督设置下，推动单目深度估计的核心技术有哪些？
RQ2多尺度特征、CRFs 和注意力机制如何影响深度预测的准确性？
RQ3将深度预测视为回归、连续 CRFs 或序数回归的优点与局限是什么？
RQ4像 NYU Depth V2 和 KITTI 这样的数据集如何支持跨方法的比较与基准测试？
RQ5哪些未来方向可以提高 MDE 的准确性和泛化能力？

主要发现

方法	rel	log10	rms	delta<1.25	delta<1.25^2	delta<1.25^3
Eigen et al. (2014)	0.215	-	0.907	0.611	0.887	0.971
Xu et al. (2018a)	0.121	0.052	0.586	0.811	0.954	0.987
Xu et al. (2018b)	0.125	0.057	0.593	0.806	0.952	0.986
Fu et al. (2018)	0.115	0.051	0.509	0.828	0.965	0.992
Godard et al. (2017)	-	-	-	-	-	-

多尺度特征融合以及尺度不变或序数损失在各数据集上提升了深度估计的准确性。
基于 CRF 的融合（连续和级联）与深度网络在 NYU Depth V2 上实现了具有竞争力的 RMSE 与精度指标。
结构化注意力和特征层级融合可以改善多尺度信息流并加快推断。
无监督的左右一致性方法在跨数据集上能实现较强的泛化，在 KITTI 上具有具竞争力的 RMSE 与精度。
基于 SID 的离散化通过解决序数回归框架中的大深度不确定性来有利于深度估计。
总体而言，使用多尺度特征的有监督方法通常优于早期方法，而无监督方法在使用立体信号训练时表现出较强的泛化能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。