QUICK REVIEW

[论文解读] Deep Ordinal Regression Network for Monocular Depth Estimation

Huan Fu, Mingming Gong|arXiv (Cornell University)|Jun 6, 2018

Advanced Vision and Imaging被引用 173

一句话总结

本文提出一种用于单目深度估计的深度序数回归网络（DORN），采用 spacing-increasing discretization (SID) 和序数回归损失，在多项基准上实现了最先进的结果，且架构轻量、具备多尺度特征提取，避免了重度空间池化。

ABSTRACT

Monocular depth estimation, which plays a crucial role in understanding 3D scene geometry, is an ill-posed problem. Recent methods have gained significant improvement by exploring image-level information and hierarchical features from deep convolutional neural networks (DCNNs). These methods model depth estimation as a regression problem and train the regression networks by minimizing mean squared error, which suffers from slow convergence and unsatisfactory local solutions. Besides, existing depth estimation networks employ repeated spatial pooling operations, resulting in undesirable low-resolution feature maps. To obtain high-resolution depth maps, skip-connections or multi-layer deconvolution networks are required, which complicates network training and consumes much more computations. To eliminate or at least largely reduce these problems, we introduce a spacing-increasing discretization (SID) strategy to discretize depth and recast depth network learning as an ordinal regression problem. By training the network using an ordinary regression loss, our method achieves much higher accuracy and \dd{faster convergence in synch}. Furthermore, we adopt a multi-scale network structure which avoids unnecessary spatial pooling and captures multi-scale information in parallel. The method described in this paper achieves state-of-the-art results on four challenging benchmarks, i.e., KITTI [17], ScanNet [9], Make3D [50], and NYU Depth v2 [42], and win the 1st prize in Robust Vision Challenge 2018. Code has been made available at: https://github.com/hufu6371/DORN.

研究动机与目标

解决单幅图像下单目深度估计的病态本质。
在标准回归的 MSE 损失上提高训练收敛性和最终精度。
通过使用高分辨率的多尺度架构和膨胀卷积，避免激进的空间池化。
引入 spacing-increasing discretization 策略和序数回归损失，端到端地训练深度网络。
在四个具有挑战性的深度基准上展示最先进的性能，并为深度离散化和网络设计提供实际指南。

提出的方法

将连续深度值离散化为区间，使用 spacing-increasing discretization (SID) 而非 uniform discretization (UD)。
将深度估计视为序数回归问题，并用量身定制的序数回归损失进行优化，考虑标签的有序性。
采用基于膨胀卷积的密集特征提取器以保持分辨率，移除最后的下采样层以避免空间细节损失。
引入多尺度场景理解模块（ASPP 具有多种膨胀率、跨通道分支以及轻量级全图编码器）以捕获全局和多尺度信息。
端到端训练网络，不进行分阶段训练或迭代精 refinement。
通过对最可能的序数标签周围的区间阈值取平均来解码预测的离散深度。

实验结果

研究问题

RQ1SID 离散化结合序数回归是否在深度估计精度和收敛性方面优于回归式训练？
RQ2基于膨胀卷积的架构以及避免重度池化对深度图质量和计算的影响？
RQ3提出的全图编码器相对于其他全局上下文策略在性能上有何贡献？
RQ4SID 使用的深度区间数量对性能有多大敏感性？
RQ5该方法是否在户外和室内基准数据集（KITTI、ScanNet、Make3D、NYU Depth v2）上具普遍性？

主要发现

DORN 在 KITTI、ScanNet、Make3D、NYU Depth v2 基准上实现了最先进的结果。
SID 相较于均匀离散化在深度估计性能上更优。
带有有序深度区间的序数回归损失相较标准回归损失在收敛性和精度上有所提升。
紧凑的全图编码器显著减少参数量，同时在性能上与基于 fc 的全图编码方法竞争甚至更优。
去除最后的池化层并使用膨胀卷积可在不进行繁重多尺度融合的情况下获得高分辨率深度图。
该方法在户外和室内数据集上均表现良好，在在线评测服务器上排名较高。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。