QUICK REVIEW

[论文解读] Cascade Residual Learning: A Two-stage Convolutional Neural Network for Stereo Matching

Jiahao Pang, Wenxiu Sun|arXiv (Cornell University)|Aug 30, 2017

Advanced Vision and Imaging参考文献 21被引用 54

一句话总结

两阶段级联CNN（DispFulNet + DispResNet）通过多尺度残差学习来细化初始全分辨率视差，在 KITTI 2015 立体结果上达到状态-的-art，同时保持高效。

ABSTRACT

Leveraging on the recent developments in convolutional neural networks (CNNs), matching dense correspondence from a stereo pair has been cast as a learning problem, with performance exceeding traditional approaches. However, it remains challenging to generate high-quality disparities for the inherently ill-posed regions. To tackle this problem, we propose a novel cascade CNN architecture composing of two stages. The first stage advances the recently proposed DispNet by equipping it with extra up-convolution modules, leading to disparity images with more details. The second stage explicitly rectifies the disparity initialized by the first stage; it couples with the first-stage and generates residual signals across multiple scales. The summation of the outputs from the two stages gives the final disparity. As opposed to directly learning the disparity at the second stage, we show that residual learning provides more effective refinement. Moreover, it also benefits the training of the overall cascade network. Experimentation shows that our cascade residual learning scheme provides state-of-the-art performance for matching stereo correspondence. By the time of the submission of this paper, our method ranks first in the KITTI 2015 stereo benchmark, surpassing the prior works by a noteworthy margin.

研究动机与目标

在困难立体区域（遮挡、纹理少、重复模式）中改进视差估计的动机。
提出一个两阶段级联架构以生成高质量初始视差并通过残差学习进行细化。
证明端到端可训练性并在标准基准上与最先进的立体方法进行对比评估。

提出的方法

阶段 1（DispFulNet）：对 DispNetC 的上采样卷积增强以产生具有详细边界的全分辨率视差。
阶段 2（DispResNet）：一个多尺度残差网络，在若干尺度上学习残差以纠正初始视差，并在每个尺度进行监督。
变形层：使用 d1 的可微分扭曲将右图扭曲成用于阶段 2 输入的合成左视图。
残差学习：最终视差为 d2 = d1 + 各尺度残差之和，使改进更容易且更稳定。
端到端训练：通过跨阶段和跨尺度的多尺度 L1 损失共同优化。

实验结果

研究问题

RQ1两阶段级联 CNN 能否在与单阶段网络相比时改善在困难区域的视差估计？
RQ2在多个尺度上对残差进行监督是否比直接学习视差提供更好的细化？
RQ3在标准立体基准（KITTI 2015、FlyingThings3D、Middlebury）上使用 CRL 对精度和运行时有何影响？

主要发现

CRL 在 KITTI 2015 上实现了最先进的视差估计，在提交时的在线排行榜上名列第一。
阶段一的 DispFulNet 相对于 DispNetC 提供了更细的视差。
阶段二的 DispResNet 相对于 DispNetS 提供了额外的增益，通过在多尺度上细化残差来提升。
端到端训练并进行残差监督可改善优化和泛化，且优于直接学习视差。
CRL 在 GTX 1080 上处理一对 KITTI 2015 立体对大约 0.47 秒，展示了具有竞争力的运行时。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。