QUICK REVIEW

[论文解读] Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image

Fangchang Ma, Sertaç Karaman|arXiv (Cornell University)|Sep 21, 2017

Advanced Vision and Imaging参考文献 20被引用 100

一句话总结

本论文提出一个单一深度回归网络，通过将RGB图像与稀疏深度样本结合，预测密集深度，在 NYU-Depth-v2 和 KITTI 上通过引入少量约100个深度样本实现对 RGB-only 方法的显著精度提升。

ABSTRACT

We consider the problem of dense depth prediction from a sparse set of depth measurements and a single RGB image. Since depth estimation from monocular images alone is inherently ambiguous and unreliable, to attain a higher level of robustness and accuracy, we introduce additional sparse depth samples, which are either acquired with a low-resolution depth sensor or computed via visual Simultaneous Localization and Mapping (SLAM) algorithms. We propose the use of a single deep regression network to learn directly from the RGB-D raw data, and explore the impact of number of depth samples on prediction accuracy. Our experiments show that, compared to using only RGB images, the addition of 100 spatially random depth samples reduces the prediction root-mean-square error by 50% on the NYU-Depth-v2 indoor dataset. It also boosts the percentage of reliable prediction from 59% to 92% on the KITTI dataset. We demonstrate two applications of the proposed algorithm: a plug-in module in SLAM to convert sparse maps to dense maps, and super-resolution for LiDARs. Software and video demonstration are publicly available.

研究动机与目标

通过将 RGB 与来自低分辨率传感器或 SLAM 输出的稀疏深度样本融合，推动稳健的深度估计。
提出一个单一的 CNN 架构，输入 RGB-D 数据（RGB 加稀疏深度）以预测密集深度。
评估深度样本数量对 Indoor（NYU-Depth-v2）和 Outdoor（KITTI）数据集上预测精度的影响。
演示在 SLAM/VIO 的密集地图扩增和 LiDAR 超分辨率中的实际应用。

提出的方法

使用基于 ResNet 的编码器的 CNN 架构（KITTI 使用 ResNet-18，NYU-Depth-v2 使用 ResNet-50）以及一个 4 次上采样的解码器（UpProj）。
用 Bernoulli 采样方案在地面真实值上对在线采样的稀疏深度进行训练，概率为 p = m/n，其中 m 为目标样本数，n 为总有效深度像素数。
使用在线数据增强（缩放、旋转、颜色抖动、归一化、翻转），并采用最近邻插值以保留稀疏点。
以 L1 损失作为默认优化目标（对异常值相对鲁棒并能保留边缘）。
比较不同的上采样模块（DeConv、UpConv、UpProj）以及第一层卷积（Conv、DepthWise、ChanDrop）以优化性能。

实验结果

研究问题

RQ1将稀疏深度样本加入 RGB 输入对密集深度预测相对于仅使用 RGB 的改进程度有多大？
RQ2稀疏深度样本数量对 indoor 与 outdoor 数据集的预测精度影响如何？
RQ3RGB+sparse-depth 模型能否作为 SLAM/VIO 的插件用于生成密集地图并实现 LiDAR 超分辨率？
RQ4哪些网络设计选择（编码器类型、上采样方法、初始卷积）能够带来最佳深度预测性能？

主要发现

在 NYU-Depth-v2 中，添加 100 个稀疏深度样本相比仅使用 RGB 将 RMSE 降低约 50%。
在 KITTI 上，100 个稀疏深度样本将可靠预测从 59% 提升到 92%。
RGBd（RGB 加约 100 个稀疏深度样本）明显优于 RGB 或稀疏深度单独使用，且在 200–1000 个样本时持续提升直到饱和。
对于 NYU-Depth-v2，带 100 个样本的 RGBd 的 RMSE 约为 0.25 m，REL 约为 0.05，显著优于没有深度输入的基于 RGB 的方法。
对于 KITTI，带 100 个样本的 RGBd 的 RMSE 约为 3.5 m，REL 约为 0.07，优于 RGB 以及在显著较少深度样本条件下的一些融合方法。
该方法使得通过稀疏的 SLAM/VIO 关键点实现密集地图重建和 LiDAR 超分辨率成为可能，同时使用相对较少的深度输入。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。