QUICK REVIEW

[论文解读] Wasserstein Distances for Stereo Disparity Estimation

Divyansh Garg, Yan Wang|arXiv (Cornell University)|Jul 6, 2020

Advanced Vision and Imaging参考文献 58被引用 46

一句话总结

引入一个连续视差网络（CDN），通过对每个离散视差值输出偏移量来产生一个视差分布，并使用Wasserstein距离进行训练，从而提升视差/深度估计以及下游的3D目标检测。

ABSTRACT

Existing approaches to depth or disparity estimation output a distribution over a set of pre-defined discrete values. This leads to inaccurate results when the true depth or disparity does not match any of these values. The fact that this distribution is usually learned indirectly through a regression loss causes further problems in ambiguous regions around object boundaries. We address these issues using a new neural network architecture that is capable of outputting arbitrary depth values, and a new loss function that is derived from the Wasserstein distance between the true and the predicted distributions. We validate our approach on a variety of tasks, including stereo disparity and depth estimation, and the downstream 3D object detection. Our approach drastically reduces the error in ambiguous regions, especially around object boundaries that greatly affect the localization of objects in 3D, achieving the state-of-the-art in 3D object detection for autonomous driving. Our code will be available at https://github.com/Div99/W-Stereo-Disp.

研究动机与目标

在超越离散视差箱的基础上推动更准确的深度/视差估计，并改善深度不明确的边界区域。
提出一种神经网络，通过对离散视差集合添加偏移量来输出连续视差值。
开发基于 Wasserstein 距离的损失函数，以将预测分布与真实分布的视差对齐。
实现对多模态地面实况的处理，以捕捉物体边界处深度的不确定性。

提出的方法

引入一个连续视差网络（CDN），通过对每个离散视差值预测一个偏移量来输出视差分布。
用基于 Wasserstein 距离的损失函数替换标准回归损失，以直接使预测的视差分布与地面真实分布相匹配。
添加一个偏移子网络，对于每个离散视差值，预测一个实值偏移以移动质量并产生连续分布。
将地面真实视差表示为（潜在的多模态）分布，并计算 Wasserstein 距离（W1 或 W2）来训练模型。
通过从邻域构建分布来实现多模态地面真实，并在训练中使用一维 Wasserstein 计算或基于 CDF 的一维公式。

实验结果

研究问题

RQ1神经网络是否能够输出一个连续的视差分布，而非单一整数视差值？
RQ2使用 Wasserstein 距离进行学习是否能够提高准确性，尤其是在深度模糊的物体边界处？
RQ3引入每个视差区间的偏移预测如何影响模态处理和收敛性？
RQ4多模态地面真实对深度/视差估计的训练效率和精度有何影响？

主要发现

CDN 在 Scene Flow 和 KITTI 2015 上的视差误差低于基线，特别是在前景区域。
基于众数的偏移预测与 Wasserstein 损失提高了边界像素估计，减少了多模态的不确定性。
多模态地面真实训练加速收敛并提高边界精度。
视差结果表明 CDN 的变体在若干指标上超过基线 PSMNet 与 GANet Deep；结合 CDN 深度的下游 3D 物体检测性能提升显著。
MM 训练有助于更快的收敛以及在边界处更好地处理本质上多模态的视差。
在物体边界的视差随 CDN 提升，在定性结果中获得更清晰的前景/背景分界。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。