QUICK REVIEW

[论文解读] P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior

Vaishakh Patil, Christos Sakaridis|arXiv (Cornell University)|Jan 1, 2022

Advanced Vision and Imaging被引用 6

一句话总结

P3Depth 提出了一种利用分段平面性先验的监督单目深度估计方法，通过预测平面系数和偏移向量来识别共享同一3D平面的种子像素。该方法通过学习到的置信度图融合来自直接深度头和基于平面性的引导头的预测，实现了在 NYU Depth-v2 和 KITTI 上的最先进性能，具有清晰的深度不连续性与一致的3D重建效果。

ABSTRACT

Monocular depth estimation is vital for scene understanding and downstream tasks. We focus on the supervised setup, in which ground-truth depth is available only at training time. Based on knowledge about the high regularity of real 3D scenes, we propose a method that learns to selectively leverage information from coplanar pixels to improve the predicted depth. In particular, we introduce a piecewise planarity prior which states that for each pixel, there is a seed pixel which shares the same planar 3D surface with the former. Motivated by this prior, we design a network with two heads. The first head outputs pixel-level plane coefficients, while the second one outputs a dense offset vector field that identifies the positions of seed pixels. The plane coefficients of seed pixels are then used to predict depth at each position. The resulting prediction is adaptively fused with the initial prediction from the first head via a learned confidence to account for potential deviations from precise local planarity. The entire architecture is trained end-to-end thanks to the differentiability of the proposed modules and it learns to predict regular depth maps, with sharp edges at occlusion boundaries. An extensive evaluation of our method shows that we set the new state of the art in supervised monocular depth estimation, surpassing prior methods on NYU Depth-v2 and on the Garg split of KITTI. Our method delivers depth maps that yield plausible 3D reconstructions of the input scenes. Code is available at: https://github.com/SysCV/P3Depth

研究动机与目标

通过利用真实世界3D场景的高规律性，特别是分段平面结构，来改进单目深度估计。
通过引入强制局部平面性的几何先验，解决单目深度估计中的尺度模糊问题。
开发一种可微分的、端到端可训练的架构，隐式学习按平面区域分组像素，以获得更清晰的深度边界。
在标准基准上实现最先进性能，同时在零样本迁移设置下具有良好泛化能力。
通过在遮挡处保留深度不连续性，生成适用于高质量3D重建的深度图。

提出的方法

网络采用双头架构：一个头为每个像素预测密集的平面系数 (a, b, c)，表示局部3D平面。
第二个头预测密集的偏移向量场 (dx, dy) 和置信度图，以识别共享同一平面的种子像素。
利用预测的偏移量对种子像素的平面系数进行重采样，生成第二个深度预测。
通过将置信度图作为可学习融合权重，自适应地融合两种深度预测，以处理偏离局部平面性的区域。
引入一种平均平面损失，以强制预测深度表面与真实深度表面之间的一阶一致性。
整个模型仅在最终融合的深度预测上进行监督训练，从而实现对偏移量和置信度的隐式监督。

实验结果

研究问题

RQ1分段平面性先验是否能通过促进共面像素之间的信息共享，改善单目深度估计？
RQ2一种可微分的、端到端的网络如何在无显式监督的情况下学习识别同一平面上的种子像素？
RQ3直接深度预测与基于平面性的引导预测之间的自适应融合，对深度精度和边缘锐度有何影响？
RQ4所提出的平均平面损失如何提升泛化能力和表面一致性？
RQ5该方法是否能在无需微调的情况下有效泛化到零样本领域？

主要发现

P3Depth 在 NYU Depth-v2 上实现了新的最先进性能，在所有标准指标上均优于先前方法（A.Rel ↓ 0.104，RMSE ↓ 0.356，δ1 ↑ 0.898）。
在 KITTI Garg 分割上，P3Depth 取得了最先进结果，A.Rel ↓ 0.104，RMSE ↓ 0.356，δ1 ↑ 0.898。
在零样本迁移设置下，P3Depth 在 ScanNet、SUN-RGBD、DIODE Indoor 和 ETH-3D 上均优于先前最先进方法，尤其在 RMSE 和 δ1 指标上表现更优。
消融实验表明，结合平面系数与基于偏移的优化方法相比直接深度预测有显著提升（RMSE：0.458 → 0.356）。
平均平面损失进一步提升性能，相比无该损失的消融实验，RMSE 降低 0.016。
定性结果表明，即使在光照变化或镜面反射表面下，也能在遮挡处保持清晰的深度边缘并实现一致的3D重建。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。