QUICK REVIEW

[论文解读] Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth

Doyeon Kim, Woonghyun Ga|arXiv (Cornell University)|Jan 19, 2022

Advanced Vision and Imaging被引用 78

一句话总结

本论文提出一种全局-局部路径网络，用于单目深度估计，采用层次Transformer编码器和轻量解码器，具备选择性特征融合，并引入垂直 CutDepth 数据增强，在 NYU Depth V2 上实现最先进的结果，并对跨数据集具有强泛化能力。

ABSTRACT

Depth estimation from a single image is an important task that can be applied to various fields in computer vision, and has grown rapidly with the development of convolutional neural networks. In this paper, we propose a novel structure and training strategy for monocular depth estimation to further improve the prediction accuracy of the network. We deploy a hierarchical transformer encoder to capture and convey the global context, and design a lightweight yet powerful decoder to generate an estimated depth map while considering local connectivity. By constructing connected paths between multi-scale local features and the global decoding stream with our proposed selective feature fusion module, the network can integrate both representations and recover fine details. In addition, the proposed decoder shows better performance than the previously proposed decoders, with considerably less computational complexity. Furthermore, we improve the depth-specific augmentation method by utilizing an important observation in depth estimation to enhance the model. Our network achieves state-of-the-art performance over the challenging depth dataset NYU Depth V2. Extensive experiments have been conducted to validate and show the effectiveness of the proposed approach. Finally, our model shows better generalisation ability and robustness than other comparative models.

研究动机与目标

通过捕捉全局上下文和局部细节，推动单目深度估计的改进。
开发结合分层 transformer 编码器的全局-局部路径网络，配备高效解码器。
提出选择性特征融合模块，以低复杂度将多尺度局部和全局特征融合。
通过深度特定的数据增强（ notably vertical CutDepth）来提升训练，利用垂直结构线索。
在 NYU Depth V2 上展示最先进的性能与鲁棒性，并对 SUN RGB-D 的泛化能力。

提出的方法

使用分层 Transformer 编码器来建模全局上下文和多尺度特征。
设计一个轻量级解码器，用最少的卷积层和双线性上采样还原瓶颈特征。
引入选择性特征融合（SFF）模块，通过注意力机制自适应地融合局部和全局特征。
采用 vertical CutDepth，一种在水平方向裁剪以保留垂直结构信息的深度感知增强。
使用尺度不变的对数深度损失进行训练，以优化深度预测。

实验结果

研究问题

RQ1全局-局部路径结构是否能通过有效结合同步远程上下文与局部细节来改进单目深度估计？
RQ2与标准解码器相比，所提出的选择性特征融合模块是否在较低计算成本下获得更好的深度图？
RQ3垂直 CutDepth 增强是否通过利用垂直结构线索来改善深度估计？
RQ4所提出的方法如何对其他室内数据集（如 SUN RGB-D）进行泛化并对常见图像损坏具有鲁棒性？

主要发现

方法	参数 (M)	delta1↑	delta2↑	delta3↑	AbsRel↓	RMSE↓	log10↓
Eigen et al. (2014)	141	0.769	0.950	0.988	0.158	0.641	-
Fu et al. (2018)	110	0.828	0.965	0.992	0.115	0.509	0.051
Yin et al. (2019)	114	0.875	0.976	0.994	0.108	0.416	0.048
DAV (Huynh et al. 2020)	25	0.882	0.980	0.996	0.108	0.412	-
BTS (Lee et al. 2019)	47	0.885	0.978	0.994	0.110	0.392	0.047
AdaBins (Bhat et al. 2021)	78	0.903	0.984	0.997	0.103	0.364	0.044
DPT* (Ranftl et al. 2021)	123	0.904	0.988	0.998	0.110	0.357	0.045
Ours	62	0.915	0.988	0.997	0.098	0.344	0.042

在 NYU Depth V2 上使用单个编码器且不进行大规模外部数据集预训练的情况下实现了最先进或具有竞争性的结果。
配备 SFF 的紧凑解码器在具有较少参数（某些配置为 0.66M）的情况下超越了使用反卷积或 UNet 风格解码器的基线。
Vertical CutDepth 在基线 CutDepth 的基础上提升了性能，最佳结果出现在 p=0.75。
在 NYU Depth V2 上，所 proposed 方法达到 delta1=0.915、delta2=0.988、delta3=0.997、AbsRel=0.098、RMSE=0.344、log10=0.042，参数量为 62M。
该模型对 SUN RGB-D 具有很强的泛化能力且无需微调，并对损坏具有鲁棒性。
大量消融实验表明解码器设计和垂直 CutDepth 的贡献是性能提升的关键。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。